Search filters

by Eszter Hargittai on September 15, 2005

A serious problem with content filters – whether add-on software or the “safe” search mode of systems – is that they often block legitimate content that should not be filtered out. These false positives can include important information that most would have a hard time defending as harmful. Paul Resnick and colleagues have done some interesting work on this regarding filtered health information.

Now comes to us a helpful little tool (found through ResearchBuzz) that lets you run searches to see what content is blocked in the safe-search modes of Google and Yahoo!. Type in a search term and see what sites would be excluded from the results when running the safe mode on the two engines.

Curiously, Google blocks the TheBreastCancerSite.com when you turn to safe mode for a search on “breast cancer” while Yahoo! doesn’t. (The Breast Cancer Site does not seem to have objectionable material, its noted mission is to raise funds for free mammograms.)

By the way, Google’s and Yahoo!’s results can be quite different regardless of what gets filtered. Dogpile has a nifty little tool that visualizes some of the differences. I discussed it here while guest-blogging over at Lifehacker a few weeks ago.

{ 8 comments }

1

Seth Finkelstein 09.15.05 at 8:16 am

I’d like to recommend all the research I’ve done on censorware:

http://sethf.com/anticensorware/

Unfortunately, I’ve had to abandon decryption research because of the prospects of a lawsuit (and no funding, and generally not having it reported either :-( ).

2

Richard Bellamy 09.15.05 at 9:05 am

One of my most enjoyable work stories is the administrative process I had to go through to manually remove the “safe cite” safeguards from my work computer.

I was trying to build a case against a defendant who I strongly suspected was a member of a white supremacist group, but all of my on-line “research” was being blocked my software that was filtering out white-supremacist websites.

It was a fun day sitting in the techies’ office and explaining why it was crucially important that I be able to get access to “whitierulez.com” or something like that.

3

John Emerson 09.15.05 at 9:37 am

Are there any good aspects to censorware?

Keeping children from seeing pornography and keeping workers from seeing pornography at work seem to be the main motives, and to me neither of these justifies monkey-wrenching a major form of communication. (Workers will still wste time no matter how many sites you block; they’ll just waste time differently.)

Political censorship of any kind, including neo-Nazis, seems especially problematic. This is a genuine slippery-slope, and it’s my understanding that some filters use the neo-Nazis and “hate speech” as an entering wedge to justify blocking most kinds of dissident speech.

4

y81 09.15.05 at 9:44 am

It is widely feared that allowing employees to view pornography will lead to sexual harassment suits. I am not aware of any actual cases on that, but the costs of a single lawsuit would surely exceed the gain from allowing broader web access, so in that sense the fear is legitimate. I suppose the same logic applies to white supremacist or other racist websites, though personally I have always suspected that this type of filtering is just bloatware.

5

Andrew 09.15.05 at 2:02 pm

Blocking that site seems ridiculous. There’s the famous story of filters blocking info relating to Super Bowl XXX because xxx=pornography. That was in 1996 or so. It doesn’t seem like they’ve become much better does it?

6

soubzriquet 09.15.05 at 4:07 pm

Andrew: It isn’t surprising that they haven’t become better. There is a vast disconnect between the marketing material of these censore-ware companies and the machine learning literature.

As I understand it: It is known that such approaches will perform poorly in the domain. It is known *why* they will perform poorly. Despite claims (unsubstantiated) by various commercial entities to the contrary, it isn’t known how to significantly improve this performance.

As it stands, it seems to me the censors have to trade off between a really large number of false positives, and being trivial to defeat (resulting in a really large number of false negatives)

7

Harald Korneliussen 09.16.05 at 3:59 am

Filters should in most cases be warning/logging based. It’s stupid for a company to block all “non-job related” sites, because all use cases can’t be listed up, and it’s a pain to contact network administration every time. Instead, make a page pop-up:
“Warning: the following page is not considered job-related by the monitoring system. All activity will be logged. Are you sure you want to continue?” (Then a textfield where you can write a note in the log as to why you access this site, and a continue button)

8

vivian 09.16.05 at 7:56 pm

Question, for Eszter, SethF or anyone else. There are two ways kids see ugly-stuff online – (1) when they actively seek it out, and (2) when it pops up following a typo, or in email. As a parent of a preschooler who can read, I want him (and us) spared situation (2). There is no hope of preventing (1) and probably no point either. But for truly unsolicited excrement and the youngest surfers, what software do you recommend that will block the most awful material without also blocking dictionaries with cuss words in them, ACLU pages and the like? In my fantasy world, such software would have no political ramifications, and would allow parents to gradually reduce the range of blocked material, until the kid hacks into the system – at which point the pawn becomes a queen.

Comments on this entry are closed.