Proposals for Incorporating Machine Learning in Mozilla Firefox

Friday June 18th, 2004

Blake Ross writes: "I will be doing research this summer at Stanford with Professor Andrew Ng about how we can incorporate machine learning into Firefox. We're looking for ideas that will make Firefox 2.0 blow every other browser out of the water. People who come up with the best 3-5 ideas win Gmail accounts, and if we implement your idea you'll be acknowledged in both our paper and in Firefox credits. Your idea will also be appreciated by the millions of people who use Firefox :-). We'll also entertain Thunderbird proposals."

#78 Re: Nuke Anything + Bayesian Filter

by lump1

Wednesday June 23rd, 2004 9:09 AM

You are replying to this message

Yes, this is by far the best idea here. I'm thinking more of AdBlock with a Nuke Anything interface. You'd start with a very conservative set of "priors" for the Bayesian filter which would block the most obvious ads that nobody wants to see, and have no false positives. Then, any image that would display would have a right-click option like "this is an ad". Clicking that would not only nuke the image/flash/iframe but all the data about that nuking would be stored and adjust the priors for the Bayesian filter based on as much data as possible, including (obviously) URL, disconnect between the URL of the image and of the page, dimensions of the image, the link target (if the image is a link) and position in the page. A bonus would be the ability to easily import/export and share training data.

An important part of the interface would be an icon that appears in the status bar which indicates that images are being blocked. Clicking it would override the blocking and show all the content of the page, with a thin red border around images that the filter thought should be blocked. If any of them are false positives, you could choose "this is not an ad" and train the filter to allow them. What would be an especially nice touch is to have that status bar icon appear in different colors depending on its perception of the "safety" of the blocking decisions that it made. So if the blocking icon shows up and it's green, it means the filter blocked something, but it's absolutely certain that only worthless stuff was blocked. If the same icon is yellow or orange, it means that at least one of the blocked objects is only about 90% or 85% likely to be worthless. That would encourage the user to click the icon and double-check the decisions of the filter. If the user decides the decisions were correct, that would be further data for the filter, and it would make similar decisions in the future with more confidence. This is a perfect problem for machine learning, and it would give people a sense of ownership over Firefox, since it would be personalized for them. It would also make them want to evangelize Firefox to friends, and to share with them their training data.

One more thing: this, or any result of this project, should be made an *extension* and no attempt should be made to cram it into the mainline code. That way it can be tested by users long before 2.0 and development won't slow down the main browser development effort. The extensions framework is beautiful, and whatever additional feature anyone can think of should be made into an extension, unless there is a very powerful reason why making it an extension would compromise its functionality. For example, bayesian filtering for Thunderbird should be an extension, so that several competitive bayesian classification schemes could be used. On the other hand, I think that a post-1.0 Firefox and Thunderbird installer wizard should include a screen with checkboxes for auto-installing the most important extensions, like this one, mouse gestures, advanced preferences, and maybe two more. It's really the extensions that make Firefox such a superior browser, and this should be showcased. So don't worry that I'm trying to diminish the prominence of this important work by suggesting it should be an extension. I'm trying to save the sleekness of Firefox and to increase the prominence of the greatest extensions.