Trusting Google’s Algorithms to Explain Google’s Algorithms

by Henry on December 7, 2009

Michael Zimmer

Recently, a student in one of my classes gave a presentation on Google, and proceeded to explain how Google ranks search results using an algorithm called…..PigeonRank: ….

PigeonRank’s success relies primarily on the superior trainability of the domestic pigeon (Columba livia) and its unique capacity to recognize objects regardless of spatial orientation. The common gray pigeon can easily distinguish among items displaying only the minutest differences, an ability that enables it to select relevant web sites from among thousands of similar pages.

By collecting flocks of pigeons in dense clusters, Google is able to process search queries at speeds superior to traditional search engines, which typically rely on birds of prey, brooding hens or slow-moving waterfowl to do their relevance rankings.

When a search query is submitted to Google, it is routed to a data coop where monitors flash result pages at blazing speeds. [HF – Didn’t Greg Bear write a novel about this once? ] When a relevant result is observed by one of the pigeons in the cluster, it strikes a rubber-coated steel bar with its beak, which assigns the page a PigeonRank value of one. For each peck, the PigeonRank increases. Those pages receiving the most pecks, are returned at the top of the user’s results page with the other results displayed in pecking order.

PigeonRank, of course, is a hoax, part of Google’s 2002 April Fool’s Day joke. But how did my student fall for it in 2009? Simple. He trusted Google. The first result when you search Google for “How does Google work?” is a link and a blurb purported to describe precisely that:

{ 60 comments }

1

rea 12.07.09 at 10:51 pm

I would not have thought that a college student in an Information Science course would be capable of swallowing the pigeons . . .

2

mpowell 12.07.09 at 10:52 pm

That must have been one awkward presentation room.

3

Renee 12.07.09 at 11:15 pm

Seriously, though, how could a student really fall for that? It says right at the bottom that it’s an April Fool’s Day joke. And all the other search results all talk about “PageRank.” Or is THIS an April Fool’s Day joke too? If so, his calendar is off by a few months…

I think the Teaching Moment is “read the whole thing,” “read multiple sources,” shortly followed by “make sure you understand it.”

4

bianca steele 12.07.09 at 11:44 pm

Sigh. I worked with someone that gullible ONCE (I wish I knew what college he’d gone to).

5

Glen Tomkins 12.07.09 at 11:50 pm

The classic explanation of thought

If memory serves, one of the Platonic dialogues (Theaetetus?) has an explanation of the mecahnics of thought in terms of the flight of birds and aggregations of birds, so perhaps the April Fool’s joke was just a cover story to throw people off the fact that Google actually does work by pigeon power.

Of course, to track down that Platonic dialogue, I’ll just go straight to Google…

6

Henry 12.07.09 at 11:57 pm

bq. I would not have thought that a college student in an Information Science course would be capable of swallowing the pigeons . . .

Correct. While cheap and plentiful, Information Science students have insufficiently large gullets. Hence Google’s use of boa constrictors for the crucial intermediate stages of pigeon processing.

7

bianca steele 12.08.09 at 12:13 am

But, Renee, the page address was “google.com/technology”–there is no other page from a similarly authoritative source…

8

marcel 12.08.09 at 12:16 am

PigeonRank, of course, is a hoax, part of Google’s 2002 April Fool’s Day joke. But how did my student fall for it in 2009?

The gift that keeps on giving.

To complete the April Fool’s aspect of the joke, I think Zimmer should publish the name of the student! Just for a few hours, long enough for the page to make it into google’s cache.

9

Cranky Observer 12.08.09 at 12:28 am

Well, we think it is a hoax anyway. Since no one outside Google truly knows how their search ranking algorithm works, and misdirection about same is undoubtedly part of Google’s security plan, perhaps they really DO use pigeons to rank searches and are protecting the secret with a purloined letter.

Cranky

All those datacenters exist merely to provide warmth for the pigeon nests…

10

Tim Wilkinson 12.08.09 at 12:36 am

The idea was presumably based on this.

11

bianca steele 12.08.09 at 12:54 am

@9 What makes you think anyone inside Google knows how it works. Tim’s clearly shown that it’s based on classified military research…

12

Michael Zimmer 12.08.09 at 1:32 am

Hi everyone, and thanks, Henry, for sparking the conversation here about this matter.

Yes, the student should have scrolled all the way to the bottom of the page, used multiple sources, and tapped into that skeptical part of the brain that would think twice about the use of pigeons in Silicon Valley (Silicon Alley, perhaps).

The most interesting aspect of this whole episode is that it happened while my colleagues and I are pushing for “information literacy” as a general education requirement on campus….

13

Anonymous Coward 12.08.09 at 1:47 am

There’s another somewhat related issue that came up in the climate warming stuff about a month (?) ago and has been raised again by other parties (Google, /., …) in the past day or so.

How many of you think that the results you get from typing in a query are the results that everyone else gets?

It’s certainly not true between different google country domains. Within a country, Google personalizes searches and has for sometime. Now they are expanding that so users don’t even have to be logged in to get different searches.

Now the question is, why do so many people think they know how google works? Google is a black box and has been, it’s not a white box. Few people do google experiments. Yet most people intuitively know that my results are the same as your results even when that’s not true in real life.

A month or so ago, Roger Pielke Jr., was lambasted over many things, but one thing in particular was along the lines that when he added terms to his search that should have limited and restricted results, he actually got back more results. Ha Ha Ha! What a maroon! That could never happen. Why? Because his critics know how search engines work!!!

Well, I’ve actually seen that same thing on many occasions. I’m not going to defend/critique Pielke Jr. on any of the other grounds, just the ground that people outside of Google actually don’t know how Google works in detail, and just because you think you know how simple a search engine is, (crawl, index, search) doesn’t mean a whole lot.

Shorter: Just because you think you know how a search engine works, doesn’t mean you know how google works.

14

flubber 12.08.09 at 1:49 am

Doesn’t it seem more likely that the story about the student writing the report citing *Pigeon Rank* is itself a rather implausible hoax, playing on people’s willingness to believe just about any report of students’ incompetence.

IOW, people believing that a student believed the hoax are being hoaxed. Or something.

15

Joseph P. Fisher 12.08.09 at 2:15 am

Word on the street is that Chuck Norris heads up operations in the pigeon coop.

16

bianca steele 12.08.09 at 2:23 am

Shorter Anonymous Coward: Plato invented Google. Therefore, computer scientists should retire and study philosophy.

17

Guest 12.08.09 at 2:59 am

There’s no way I’m buying the idea that the student was actually fooled. I’d have to have been there and seen it. C’mon.

18

Andy Bach 12.08.09 at 3:39 am

@bianca – CS folks, in general, started in Philo and then moved into CS … in hopes of getting something, if not done, actually to move – even if it was just that little turtle on the screen. FWIW I’m voting for the metahoax of a pigeon rank believing student.

19

Anonymous Coward 12.08.09 at 4:41 am

Bianca,

I didn’t say anything remotely like that, and I assume you both read and understood my post. So do you actually have anything of substance you’d like to say about my comment, or are you going to stick with your content free silliness?

20

Substance McGravitas 12.08.09 at 4:48 am

So do you actually have anything of substance you’d like to say about my comment

I’d say it reads an awful lot like Tom Fuller.

21

David Hobby 12.08.09 at 5:04 am

I agree with Coward, who wrote: “A month or so ago, Roger Pielke Jr., was lambasted over many things, but one thing in particular was along the lines that when he added terms to his search that should have limited and restricted results, he actually got back more results.”

I did some experiments trying to use numbers of hits on Google searches to measure how likely words were to appear together. The X in “Results 1 – 10 of about X for …” jumped around all over in ways you wouldn’t expect. This could be because Google is drawing the line between results and non-results in a tricky way. Since nobody is going to look through a million results, this is not a problem in practice.

22

Substance McGravitas 12.08.09 at 5:07 am

I’ve noticed the difference in search results – I do or try to do a lot of narrowing down – but Anonymous Coward is being a little weird. The “expansion” mentioned involves cookies, which you have control over.

23

Doctor Slack 12.08.09 at 5:15 am

AC: Google is a black box and has been, it’s not a white box.

I think Anonymous Coward needs to do some hard thinking about implicature and white box privilege.

(Too soon for that joke? Too late?)

24

Hoover 12.08.09 at 9:04 am

I’m having difficulty believing the meme that some student believed the meme that Google is arranged by pigeons.

25

bad Jim 12.08.09 at 10:44 am

The opposite fallacy, disbelieving something because it was published on April 1 by a computer company, is something I did. Artisoft had something about a partnership with Apple which I dismissed as a joke but which turned out to be true.

The computer brethren take joking very seriously, and April First is certainly sacred, but not everything you read is what you expect.

26

alex 12.08.09 at 11:11 am

The internet is serious business.

27

JoB 12.08.09 at 11:12 am

A new science: figuring out how google works. Ain’t that exciting!

28

Amanda French 12.08.09 at 11:50 am

The original post on Michael Zimmer’s blog has apparently been deleted. Whazzup? (See, me, I try to check my sources.) :)

29

Glen Tomkins 12.08.09 at 12:40 pm

Don’t confuse your cranks

“Shorter Anonymous Coward: Plato invented Google. Therefore, computer scientists should retire and study philosophy.”

You seem to be conflating Anonymous C and myself. I’m sure the Google software engineers have disambiguation protocols they could fit you out with to help with that conflation problem, but as their solution probably involves pigeons hovering about your person as if you were some latter day St. Francis, you may want to work this out for yourself.

30

thy blackest jam 12.08.09 at 1:07 pm

It’s pretty obvious to me that the student was not being serious … note “traditional search engines, which typically rely on birds of prey”, and the final sentence’s “pecking order”.

Sounds like something I’d write.

31

marcel 12.08.09 at 1:36 pm

Amanda French:

But the google cache version exists (@ 8:35 EDT on 12/8/9). Do we trust google caches? Or could they be making them up to spoof us?

32

Tim Wilkinson 12.08.09 at 1:40 pm

bianca steele @11 To clarify, @10 I just meant that the Google spoof was presumably ‘inspired’ by the true and fairly unremarkable (and unclassified) story about a proposed bomb guidance system.

33

Michael Zimmer 12.08.09 at 2:02 pm

Just got up and saw that my post has been removed. I have no idea how that happened. Will investigate and re-post.

34

JoB 12.08.09 at 2:06 pm

Blame it on the anti-google.

(or maybe your student secretly is a whiz kid)

(or worse!)

–zip–

35

Michael Zimmer 12.08.09 at 2:17 pm

Odd. Both the original WordPress post and the related images were removed from my web server account (I have since restored them). Perhaps I should look for evidence of pigeon droppings….

36

kid bitzer 12.08.09 at 2:18 pm

i understand google won’t return any hits at all if you google “gullible”.

37

bianca steele 12.08.09 at 2:30 pm

Anonymous Coward: You were serious? I’m sorry. Really, who were you addressing your post to, who thinks they know everything about how a search engine works, yet ought to stop worrying about how a search engine works, because they’ll never understand it?

Glen, Tim: How else would you explain why so many people don’t understand how the system they work on works? It’s got to be either a Platonic Idea that they intuit though can’t explain (and OMG DON’T change that!), or else a highly “classified” secret that only Freemasons are permitted to be informed about.

Seriously, Google doesn’t use ordinary logic in performing its searches. They have algorithms that decide how to make sure the top results are the ones that show up first, over and above PageRank. Yes, it would be nice if they had more obvious help for people sophisticated enough to use it, but it seems they are of the Steve Jobs “don’t ever make it look complicated” school.

My guess is that the nonintuitive result counts are an artefact of the indexing system.

38

Walt 12.08.09 at 2:30 pm

kid bitzer’s comment is so awesome that I had to try googling “gullible”. It turns out that as of this moment, Wikipedia does not have an entry for “gullibility”.

39

onymous 12.08.09 at 2:46 pm

rea said: “I would not have thought that a college student in an Information Science course would be capable of swallowing the pigeons . . .

Well, he is a geek.

40

Tim Wilkinson 12.08.09 at 3:08 pm

bianca steele @34 Why are you addressing your remarks to me? Either explain, or leave me out of whatever it is you are talking about.

41

bianca steele 12.08.09 at 3:19 pm

Tim, the nature of the posts so far have made it clear that they are not serious. If you don’t want to participate in this kind of discussion, why are you participating here with people who do? It’s not like you get paid to do it.

42

bianca steele 12.08.09 at 3:50 pm

@37: “top results” sb “results most customers will want most to see”

43

Salient 12.08.09 at 5:15 pm

What would it mean to use “ordinary logic” in performing searches or sorts, and what would “more obvious help for people sophisticated enough to use it” consist of? Was the latter actually a sarcastic reference to SearchWiki?

44

bianca steele 12.08.09 at 5:34 pm

Salient, it would help if you had a more specific question. “Ordinary logic” means that combining two search terms returns the intersection of the result sets for each. “Better help” means pointing out on the first page that a search for “Henry Farrell” is not a search for “Henry Farrell” but rather a search for things that are more like “Henry Farrell” than the intersection of “Henry” and “Farrell”.

45

jhe 12.08.09 at 5:58 pm

Not necessarily a terrible metaphor for the MapReduce Framework. I’m not sure what the statute of limitations is for claiming to be speaking metaphorically though.

46

Salient 12.08.09 at 6:34 pm

Ah, ok. Your answer makes sense to me, but I’m afraid I don’t have any more specific a question. The idea of “better help” falls under the category of better transparency about how search results are organized, I get that now — wasn’t sure if by “better help” you meant “clarify for users how to achieve better search results by modifying their input” (better transparency) or “provide access to alternative sets of algorithms and allow the advanced user to choose which algorithms are implemented when computing results to a particular search” (new functionality).

There’s trouble with implementing ordinary logic, of course; no search engine could implement it strictly and stay in business for long. Trivially, a web page which contains user-desired information about “Henry Farrell” might not show up in an ordinary-logic search for “Henry Farrell,” because the page itself and links to the page might only use his last name, or use Dr. Farrell, or Prof. Farrell, etc.

And people often type questions into google, and the content which answers the question often has only the slightest tangential resemblance to the words used to ask the question. For example, ‘where is the closest cheap hotel to my house’ is a perfectly ordinary user search. A search engine which returns the intersection of the pages indexed by {where, closest, cheap, hotel, house} would not address the user’s query. Only two lexical units are important: “cheap hotel” and “my house” — but even a search engine which identifies this and returns pages indexed by {cheap, hotel, my house} won’t return anything useful to the user. The relationship between these lexical units, obvious to a human reader, is hard to assess according to any general deterministic scheme (and harder when users don’t follow consistent grammatical guidelines).

A more subtle example: a search for ‘production assistant’ (no quotes) returns results ranked completely differently than results for {production, assistant}. The latter words return, in top results, miscellany and definitions; the former returns job-oriented information which (so far as I can tell in a quick visual scan) doesn’t appear in the top 100 results for either individual word.

Hence, the deployment of pigeons, a species with an uncanny sense of orientation, renowned for centuries of contribution to the advancement of human communication.

Something important wasn’t clarified, either in google’s informative explanation or the student’s follow-up research. So I’ll break the news. Those pigeons who perform exceptionally well, in terms of favorable results rankings on SearchWiki, are eventually mainlined into the system: coaxial cable is wired directly into the toes, and data is transmitted directly through the pigeon’s own neural fibers, so that search results are more directly computed and stored. These results may be retrieved instantly from the pigeon at any time, and may be transmitted directly to other pigeons, a process known as “infection.” By analogy, the pigeons which receive the distinction of direct system hookups are referred to as “carrier” pigeons.

I do agree with the idea that google/others could usefully introduce an option to “search by strict indexing” or some such thing, so that users who want to see only those pages which are intersections of all words typed in can do so, bypassing the pigeonization of their query entirely. The “advanced search” feature doesn’t cut it. But this technique wouldn’t give useful results to the bulk of searches google receives, so it shouldn’t be the default setting. [bianca, I’m assuming you and I basically agree on this, since you said “for people sophisticated enough to use it.”]

47

bianca steele 12.08.09 at 6:43 pm

Salient, I’ve only skimmed your comment and I’ll have to read it later, but I don’t agree with what you attribute to me. I’ll have to get back to you.

48

JP 12.08.09 at 8:07 pm

I would love to see the article presented to viewers of various TV “News” shows to see which audiences are the most gullible.

49

James 12.08.09 at 9:14 pm

Um, your student is still an idiot though . . .

50

Glen Tomkins 12.09.09 at 12:35 am

bianca steele,

“Glen, Tim: How else would you explain why so many people don’t understand how the system they work on works? It’s got to be either a Platonic Idea that they intuit though can’t explain (and OMG DON’T change that!), or else a highly “classified” secret that only Freemasons are permitted to be informed about.”

People use tools quite successfully that they really don’t know the inner workings or mechanics of. This is true of literally mechanical tools, and doesn’t seem to surprise us, as when even a very good pilot doesn’t know what makes the jet engine he uses all the time (with our lives in his hands!) work. But it’s also, admittedly a bit counterintuitiuvely, true of cognitive tools. A surgeon, for example, could be quite innocent of all the epistemological theories about why science is true, or what the truth status of scientific propositions might be, and still be a much better choice to take out your gallbladder than my not very humble self, despite my status as a Board Certified and Fellowship-Trained Academic Internist (MPH and all!) who can lecture for hours on end (to proven deadly effect on my usually most unwilling audiences!) about the evidentiary basis of medicine.

For all the talk about evidence-based medicine being a good thing, and who can argue about using evidence when it’s available, the evidence they’re referring to is that gleaned form quantitative methods. While that’s wonderful stuff, and not to be ignored, what you can do, in medicine, with numbers, is pretty superficial. The real basis of medicine is the catalogue of diseases we have laid out over the millenia. That catalogue, which is absolutely prior to all quantitative methods, since you have to be able to differentiate apples form oranges before you can count either, is based entirely on pattern-recognition. It has often been quite spectacularly wrong, and the proof, the sole proof, that we have it right in identifying a particular disease entity, is that we can then act as if there really is such a disease, and it seems to work better, kill somewhat fewer people, than if we didn’t recognize such-and-such as a disease. Methodologic feet of clay! But I still would not advise refusing the antibiotic you’re offered if you get bacterial meningitis just because bacterial meningitis is a construct of dubious footing.

Plato, in those bits of the dialogues that I think I sort of get (and certainly not the parts where he talks about thinking being like flight patterns of birds — eeck!), explains this on the basis of our metaxic existence. We live in the middle somewhere between reason and natural order laid down by God/the gods/whatever (and we sure are in no position to be more specific than that), and the chaotic impulses below us. We can, partly, intuitively, who knows for sure how, recognize the difference, sometimes, between impulses from above and those from below, but not with any sort of reproduceability. We’re never sure if we’re the victims of false order from below, or the good, reasonable order from above. We ain’t God — tough luck; but some of the threads that control us like puppets are Golden. Or something like that. You buy this stuff or you don’t see its point — there isn’t much room in the middle.

51

Salient 12.09.09 at 5:00 am

I’ve only skimmed your comment and I’ll have to read it later, but I don’t agree with what you attribute to me.

Whatever my error was, I’m going to go ahead and blame it on the carrier pigeons, who have all come down with bird flu.

52

fred p 12.09.09 at 9:12 am

(@ Glen Tomkins, could you maybe recommend some light reading on the basis of medicine, evidentiary or not.)

53

Anonymous Coward 12.09.09 at 5:38 pm

I don’t know how many links I can add in before it will be seen as Spam, but:

Google’s personalized search no longer requires the use of cookies.

http://searchengineland.com/googles-personalized-results-the-new-normal-31290

Personalized search has for sometime meant results in your city and different from the results in my city.

Personalized search has for sometimes means that the results of Query(N) depend on the content of Query(N-1).

http://searchengineland.com/previous-query-refinement-coming-to-hit-google-results-13743

“who thinks they know everything about how a search engine works, yet ought to stop worrying about how a search engine works, because they’ll never understand it?”

Bianca, I never said what you said I said the first time, or the second time. You my dear in your rantings above are a perfect example of someone who claims to know how Google works because you know how you would create a search engine. It’s clear you don’t know how Google works. I never said you should not strive to understand it.

My point is even if you think you know how Google works today, it’s not how Google works next month. And just because you think you know what a search engine does, does not mean you know how Google works. If you want to live in the 21st century, instead of ignoring it, you may want to look further into it and not gloss over it.

(I have nothing to do with searchengineland, I ended up there after reading an article about this in the Register who’s regular Google critic, Orlowski, has declared that Google has given up on search.)

54

Substance McGravitas 12.09.09 at 6:04 pm

Google’s personalized search no longer requires the use of cookies.

Thanks for the correction. I’m still not sure what the worry is.

55

McMurphy 12.09.09 at 6:30 pm

Google mysteries are part of the corporate appeal. Pigeon ranking or whatever sooper fast technique Sergey Brin and Larry Page cooked up outperformed the competition at least by a few nanoseconds (anyone remember webcrawler?) and young obscure steinford millionaires blossomed into globalist billionaires in a year or so. Google– building a better cyber-accounting system, one day at a time….

56

bianca steele 12.09.09 at 8:01 pm

@53: Let me guess: some tech support person wouldn’t give you the information you thought you needed, or told you your guesses about his company’s product were wrong, and you’re sharing the love.

There’s a reason I don’t comment about topics I really care about and feel I have some responsibility to represent concerning.

57

Tim Wilkinson 12.09.09 at 10:38 pm

Anonymous Coward @53 etc. – FWIW, I at least haven’t sniffed out or projected any weirdness, craziness, absurdity, worry, un- or over- seriousness, or animus against techies in what you’ve written. It appears quite straightforward and on-topic. (Or at least on probably the only topic this thread could sensibly gravitate to, i.e. the behaviour of Google search, or as you can perfectly well term it, ‘how Google works’.)

58

politicalfootball 12.10.09 at 1:48 pm

Google’s personalized search no longer requires the use of cookies.

I found out recently that this is another one of those April Fool’s jokes. Turns out the personalized search always worked even if you didn’t bribe the IT guy with a cookie.

59

Rawls 12.10.09 at 9:40 pm

The pigeons are real. Morons.

60

Glen Tomkins 12.11.09 at 2:37 pm

fred p,

I’ve never run into a book that addresses the evidentiary basis of Medicine, so I can’t say which ones might be any good, or which might be bad, or even if there are any accounts of the subject that even try to do it justice. What I have run into in practicing Medicine is the fact that the evidentiary basis is quite heterogenous from topic to topic within Medicine. (We haven’t been Pasteurized, either. Medicine is sort of the raw milk among the sciences.) To put it in Kuhn-speak, some of our paradigms are almost like those that Physics has developed, in that you can move around within them using equations (e.g.., acid-base disturbances), some are hopelessly woolly and full of just plain wrong (e.g., the Kveim test, and sarcoidosis in general), but most are just way less mature than what you see in a science like Physics.

Again, in Kuhn-speak, instead of working on only one or two paradigms at a time, like Physics, Medicine has no choice but to try to keep literally thousands of paradigms spinning at the same time. Physics has the luxury of only having to pay attention to problems that seem amenable to its favored methods, amenable to summary in equations; while Medicine has to focus on every problem that’s out there killing people, or even just worrying them — whether that problem happens to be amenable to equations, or amenable to nothing but prayer and fasting, or not even real, and everything that fits in the cracks between such broad categories.

Comments on this entry are closed.