“Ed Felten”:http://www.freedom-to-tinker.com/archives/000509.html has a nice post on Google from a few days ago, suggesting that laments for the halycon days before people tried to manipulate Google are misconceived. His rejoinder: Google results don’t represent some Platonic ideal of the truth – they’re the product of collective choice.
bq. Google is a voting scheme. Google is not a mysterious Oracle of Truth but a numerical scheme for aggregating the preferences expressed by web authors.
This means, as Felten suggests, that Google isn’t perfect, and can’t be. Indeed, the point is underlined by “Arrow’s possibility theorem”:http://www.bbc.co.uk/h2g2/guide/A520372 which says, more or less, that any form of aggregate decision making is going to be flawed under certain reasonable assumptions. Felten’s insight is an important one – it opens the door for the application of a plethora of interesting results from the theory of collective choice to Google and other aggregators/search engines. There are some eminently publishable academic papers in there for anyone who’s familiar both with this literature, and with public choice theory. There’s a more general point too. Much of the early rhetoric about the Internet suggested that it somehow managed to escape from politics. Some people (Declan McCullagh for example) are still trying to peddle this line. It’s ridiculous. The Internet and other communications technologies involve real collective choices, with real political consequences, and the sooner we all realize this, the better.
{ 13 comments }
Chris Lightfoot 02.11.04 at 8:12 pm
Note that the Arrow theorem only applies to deterministic voting systems. Probabilistic systems, like those described in Probabilistic electoral methods, representative probability, and maximum entropy (PDF) by Sewell, et al., can escape from that problem. As it stands, “Page rank” is deterministic, but that’s not an inevitable feature of any search engine which uses query-independent evidence.
Arthegall 02.11.04 at 11:08 pm
I think Henry should be more clear what the “aggregate” means when he writes of “aggregate decision making” and its impossibility.
In other words, Arrow’s theorem causes trouble for PageRank not because the algorithm is required to aggregate the preferences of many web authors (for the purpose of Arrow, ranking pages by their total number of links from all web authors counts as a single preference relation), but because each web author expresses his preference for a web page in many ways. Or, alternately, Page Rank takes other page values into account, besides its total in-degree and out-degree (link-wise).
So if PageRank has one preference on web-pages from the linking habits of authors, and another preference that attempts to counter-act the effects of Google-bombing, Arrow’s theorem means it will never be able to integrate the two “perfectly.”
But if PageRank were *just* counting web-links, then the presence of one or many web-authors wouldn’t be a problem.
humeidayer 02.11.04 at 11:12 pm
Henry, there are a few facts, though, that are rather unsettling to me.
Consider how much spam is proliferating on the net and practically destroying it. Even though virtually no one (a fraction of a fraction of a percentage) responds to spam, it’s still profitable because the distribution costs for spammers are virtually nil, which results in shotgun blasts of tens and hundreds of millions of emails offering a thirty dollar product.
Next, consider blogs and the economic value of a very high page rank in Google. People are already creating robots to place ads and links in blogs such as this one. Like spamming, this too, can be automated, and it seems the incentives for unscrupulous individuals to do so is enormous.
It may be democratic, perhaps, but imagine a democracy in which individuals can force others to vote for them, hundreds or even thousands of times per second.
Then there will be countermeasures and counter-countermeasures and so on…
Tim 02.11.04 at 11:23 pm
http://www.google-watch.org/
It’s a well written, greatly detailed account started years ago on just how un-populist Google is. Is everything this guy says crap? I’ve never heard anyone talk about it, but it’s amazing stuff.
Bob McGrew 02.12.04 at 2:39 am
As far as I know, straightforward randomization doesn’t let you get full non-manipulability except in a single case (the random dictator system), which isn’t very good.
Speaking of academic papers, I’m actually working on a paper much like this right now – where the goal is simply to prevent one person from increasing his own PageRank without sacrificing much in terms of computation or difference from the original metric.
James Surowiecki 02.12.04 at 8:02 am
Henry, you really should update your post to take into account arthegall’s point. Arrow’s impossibility theorem does not apply to “any form of aggregate decision making.” It applies only to situations in which the people in a group have different preferences, not different opinions. As long as everyone in the “group” agrees that there is a single right answer to the question they’re trying to answer or problem they’re trying to solve, even if there are serious disagreements about what that answer or solution is, then the aggregate decision will not be flawed (in the sense that you use the word). To take only the most obvious example, Arrow’s theorem does not apply to betting markets, since all bettors are trying to find out the same thing: what are the odds that a given team or horse will win.
I’m also unconvinced that, for the most part, the other-preference problem is doing too much damage to PageRank. I’m still astonished at how accurate Google’s results are, which suggests to me that PageRank is doing a good job of aggregating people’s opinions about which page is most likely to have the right information.
As for Felten’s point about Google being simply the product of collective choice, it is of course just that. But we know that the collective choices of large groups of random people tend to be uncannily accurate. It won’t give you Platonic truth, but if you want to get as close to the right answer as humans can come, taking the collective judgment of a large group of people is the best strategy. Google’s execution may be sometimes imperfect, but its theoretical foundations are impeccable.
neil 02.12.04 at 8:40 am
Time for some Marxist analysis. Google’s version of truth could be considered better than the common-sense version of the truth is. After all, we know that the latter is just what the ruling class considers the truth to be. They control the medium so they control the message. Google takes input from pretty much anyone, though; it’s shifting the power to define truth away from the ruling class and towards anyone with a computer and a blog.
On the other hand, this could work the other way; Google is quite plainly not influenced directly by a central authority, but the ideas of its individual contributors are. Especially considering that it gives more credence to more popular viewpoints, and the more popular ideas will invariably be the ones which are easier to swallow, i.e. trending close to the ruling class’s ideas. Thus it gives the impression of being an independent source of truth, but in actuality, it’s just a more subtle form of distortion.
Chris Lightfoot 02.12.04 at 9:17 am
Bob– read the paper. They go beyond “random dictator” and develop a maximum-entropy based scheme which works in n-candidate elections.
loren 02.12.04 at 3:53 pm
James: “Arrow’s impossibility theorem does not apply to “any form of aggregate decision making.†It applies only to situations in which the people in a group have different preferences, not different opinions.”
Minor quibble, or perhaps merely a clarification: Arrow’s results _apply_ to any algorithm, rule, or process for mapping several individual rankings of alternative social states onto a collective ranking of those states. The question is whether or not that application raises troubling results, given your aims or intutions.
For instance, even if everyone has the same underlying preference (finding the correct answer to some shared problem), we can still interpret their opinions about various proposed solutions to that problem as preference orderings. In such a case, their preferences will almost certainly be single-peaked, and simple majority rule will yield a transitive collective ranking. Many people will not be troubled by an application of Arrow’s framework in such settings, because it doesn’t yield results that are as troubling as, say, majority rule in settings of conflicting preferences (to be sure, we might still be troubled by simple majority rule, for reasons that have nothing directly to do with Arrow, but put that aside for now).
Nor is there anything especially uncanny about the tendency of simple aggregation procedures to “find the truth” in such cases, for instance, when there are two plausible options and several implausible choices … binomial distribution, law of large numbers … “Regis, I’d like to poll the audience.”
I’m also not sure that I’d generally interpret google rankings as reliable estimators of peoples’ judgements about which pages are most likely to have the most accurate information. That might be the case for some pages, but not for others.
There’s also the concern that Neil raises: we might simply be discovering not opinions about the truth of the matter, but opinions about what the prevailing opinion is … democracy a la Solomon Asch (“no, I ‘m sure the group is right: that small line really is the same length as the others — I’d like to change my vote, please”).
James Surowiecki 02.12.04 at 6:51 pm
Loren, I always thought it was called an “impossibility theorem” because it showed how it was impossible to arrive at a collective decision-making rule that satisfies a set of very reasonable assumptions. But if preferences are single-peaked, then in fact it’s not impossible. We may still be troubled by the result, but if we want to know what the “collective choice” of the group is, in the case of single-peaked preferences myriad forms of aggregate decision-making are not, as Henry asserts, inherently flawed.
I’d agree that Google rankings represent people’s judgments about which pages are most likely to contain the best information better in some cases than in others. But I do think the proof of the pudding is in the eating, and I remain impressed by how effective Google is.
Having said that, I think the problem of people following what they perceive as the collective wisdom rather than their intuition is a real concern (whether the Asch experiment represents people “learning” from each other or simply succumbing to the desire not to rock the boat is a different question). The irony is that the more people pay attention to what the “group” thinks, the dumber the group’s judgment becomes.
Jason 02.12.04 at 9:08 pm
The “irrelevance of ignored alternatives” assumption of the impossibility theorem is the one that I think doesn’t really apply here.
It’s a reasonable thing to want, but given that you can’t have it, it’s not really much of a sacrifice to give it up.
I was just at a Ken Arrow talk last week, and he was talking about his collective choice functions, and there was a little debate about these very topics afterwards. Funny timing.
Jason 02.12.04 at 9:21 pm
Chris, Maximum Entropy would violate the irrelevance hypothesis of the theorem (at least any formulation of “entropy” I can think of).
BTW Bob, I’ve just fired you an email.
Chris Lightfoot 02.12.04 at 9:53 pm
You mean the “independence of irrelevant alternatives” condition? As I understand it, the scheme they propose does satisfy that (though it relaxes the clarity of voting condition to apply to pair-preferences only). See pp17-28 of the report.
(I should say that I’m not sufficiently familiar with the work to discuss it in great detail; but it seems an interesting idea and relevant to the discussion.)
Comments on this entry are closed.