Some thoughts, related to Michael’s ‘going pro’ post and Kieran’s recent post on impact factor. To what extent is the whole internet afflicted with the Matthew Effect? “For to all those who have, more will be given, and they will have an abundance; but from those who have nothing, even what they have will be taken away.” If you want to be a bit more specific, to what degree are search results afflicted by it?
Let me illustrate with a couple cases I’ve personally noted, which I suspect are representative. I just wrote a book about Plato [update: now optimized!], so naturally I’m curious what comes up if you Google Plato. Predictably: Wikipedia. The Stanford Encyclopedia of Philosophy. (I’m going to ignore erroneous results, due to ambiguity: computer systems named Plato, famous drivers named Plato, former child star actresses who committed suicide named Plato.) You get somewhat arbitrary Google book results. Why, in particular, is an edition of the Theaetetus, edited by Robin Waterfield #6? You also get a number of pages that, not to put too fine a point on it, look to have been designed along 1996-1999 lines. Because that’s surely when they were originally posted. This page, for example, is #2, right after Wikipedia, beating out even the SEP. Now, that’s nuts. There’s nothing wrong with the page, as far as it goes. But it’s clearly a beneficiary of the Matthew Effect. Google users are brought to this page – in droves, I’ll wager – because it was posted by an early-adopter of the interwebs thingummy. A similar example is this page, coming in at #6. This one is a much more serious project, by someone who is clearly competent to write about Plato, and who moreover has worked pretty hard to maintain and build-up this site. (Not that I’m implying the author of that other page was not competent. Just that the content hardly explains the #2 ranking.) That second site posts public stats, which are interesting: “870 000 visits in 2008 (an average of 2 374 visits per day).” I wouldn’t be surprised if the author of this site is, in a way, the world’s most influential Plato scholar, due to the fact that he had the good luck to start posting in 1996. Out of the top 10 hits for Plato (ignoring erroneous hits) we get, by my count: 2 that clearly deserve to be in the top 10 – Wikipedia and the SEP; 3 Google Books titles that are perfectly respectable but pretty random – i.e. none of the three is one of the first titles you would mention to someone asking ‘where should I start, to find out about Plato?’; 3 personally-maintained sites that are clearly here because they are late-1990’s Matthew Effect beneficiaries; a pretty good animated video of the Cave Parable on YouTube; and a link to the Plato page of the MIT Classics Archive, which – despite the academic imprimatur – is a late 1990’s affair. Another Matthew case. (The last time I visited, a lot of the links were broken. But maybe someone has fixed that.) The content is Jowett translations; that is, old stuff.
What are we missing? The Perseus Project, for one. I was surprised to see no Amazon links cracking the top 10. (Not that I think that’s so important, but I’m surprised.) How did we do? So-so. Partly the problem is that you should enter more intelligent search parameters. But part of the problem is runaway Matthew Effect. I suspect that the three random book hits could be explained by the Matthew Effect, in some way. Someone must have linked to these books. And these titles, rather than some others, lucked into a high slot. It’s interesting that Google doesn’t do better. (Not that I have any bright ideas.)
Second case: last year I posted this X-Mas card set on Flickr. (I’m making more this year!) Anyway, long story short, one of the images got Stumbled, as a result of which, eventually, two rather similar images diverged dramatically in their traffic. This one has been viewed 2,000 times. This one has been seen 12,000 times and has thereby accounted for 10% of the traffic my Flickr account has ever received. (I actually like the first image better.) I was curious whether it would just go and go like that forever, but recently it’s stopped. My Stumblejuice ran dry. (The part of me that values justice is glad to see this. The part of me that likes getting free stuff for no good reason is a bit dismayed.) Anyway, I don’t really understand how ranking sites like Stumble and Delicious and Digg and so forth work because I don’t use them myself. But it strikes me that all this stuff clutters things up worse, Matthew-wise. [UPDATE: clarification. I don’t mean the one pic got a huge spike that then disappeared. I’ve gotten those, too. Rather, you get a steady, slightly higher rate of traffic – in my case, 25-50 hits a day for months and months and months. But all that adds up.]
What could search engines do to combat the Matthew Effect better, algorithmically? Obviously if anyone knew, then Google would know, and presumably Google would then do it. (Or would they?)
{ 33 comments }
ogmb 09.19.09 at 8:55 am
My Flickr stats (spike by Ffffound.com). I get that somewhat semi-frequently, that one of my pictures gets picked up by a website with an enormous reach (compared to my own). The immediate effect subsides pretty quickly — very few outside viewers stay to look at the rest of my pictures — but it generally leaves a smitten of new contacts who stick around, so there is a bit of a build-up over time. Of course once your picture gets many outside viewers, it will also move up in the internal Flickr search rankings and get more views that way.
I contemplated this topic in a now abandoned research project on the emergence of experts. In essence, what does it tell you that someone is an expert to a small group of people, but most of those are in turn experts to others. Or expressed as the Velvet Underground phenomenon: “The Velvet Underground never sold many records, but it seems like every one of the group’s fans went out and started a band.” Google could potentially trace this hierarchy of experts to the source, but I don’t think they have much of an incentive to. That’s why if you look for some song lyrics you get five pages of lyrics porn sites before you get the original artist or fan site where all the porn sites swiped the lyrics from.
John Holbo 09.19.09 at 9:55 am
Sorry, I should have been clearer about the Stumble effect, in my case. It wasn’t a big spike. I’ve gotten those, too. Rather, it was a steady 25-50 visitors a day for one pic, via Stumble, more than the other pic was getting. For months and months and months.
Chris Bertram 09.19.09 at 10:18 am
_What could search engines do to combat the Matthew Effect better, algorithmically? Obviously if anyone knew, then Google would know, and presumably Google would then do it. (Or would they?)_
Dunno, but I wonder if Flickr Explore has part of the answer to that question. My top listed photos (presumably a combination of page views, favorites, comments, notes and links weighted by friends, contacts, strangers etc) don’t at all correspond to my most viewed and old stuff seems to fade in rank.
mollymooly 09.19.09 at 10:19 am
Are the book rankings consistent between Google-web, -scholar, and -books; and should they be?
yabonn 09.19.09 at 10:19 am
Tried the same search with another engine (Bing). It seemed to me there were less Matthews in the results. I can’t seem to get completely rid of the regional settings with that thing, though, so I can’t be sure it’s not noise.
Chris Dornan 09.19.09 at 10:50 am
I would like Google to continue to do what it does and rank on links, though they could weight more recent links, making it easier for recent content to break in.
But how to get folks to link to the right stuff? Those commanding the Google juice should be linking more, and better directory services to complement Google. SEP entries really should take their external links sections much more seriously, or better connect them to a complementary philosophical link-directory.
How about adding a links section to Reason and Persuasion? Seriously! It would be nice to know where the Plato action is to be found.
Dave Weeden 09.19.09 at 11:04 am
Well some of what you’re seeing is down to (perhaps unintentional) search engine optimisation. The second page you linked to – the one at #6 – has ‘plato’ twice in the url, ‘plato’ is the first word in the title, the title appears again near the top of the page, and there’s a little picture with the filename ‘plato.gif’. Assuming that search engines don’t know anything about Greek philosophy but have a sort of common sense approach – it’s a page about Plato! Hence it’s high placing. Google killed off some of the keyword spamming practices that were rampant in the late 1990s, but keywords still make a difference. Being linked to isn’t enough: too many links have text like ‘here’. You need incoming links that say ‘plato’ so Google knows that’s why others are linking to that page.
A bit of messing with word order can still make a difference.
belle le triste 09.19.09 at 11:12 am
Search engines know EVERYTHING about Plato, it’s Heraclitus and Aristotle they’re weak on. (Defend this claim. Do not write on both sides of the paper.)
Eszter Hargittai 09.19.09 at 11:18 am
Since Google and every other search engine is proprietary about their search algorithm, it’s very hard to know what matters. I’ve often wondered how much they pay attention to which search results people click on and how quickly – if at all – they return from them (presumably a signal that the page was or was not relevant, at least to some extent). Depending on how much they pay attention to such click rate (and to what extent, if at all, they track returns to the results list), the Matthew effect could very much be in play not just as the level of linking, but at the level of clicking as well.
There are famous cases of what many would consider inappropriate results showing up on top of the results lists. There have been some academic pieces written on related issues that cover some examples. (I’m somewhat purposefully not linking to these here directly. Some of this relates to some of my research and I’d prefer not to influence relevant rankings.) See, for example, this piece in an issue of the Journal of Computer-Mediated Communication I edited a couple of years ago.
Regarding your site in particular, I think there are a few things you could do at your end to help its popularity. For one thing, I would include “Plato” in the title tag of the page. It’s not there right now and I could see how that might matter. It’s also not up on top of the page at all, which may also matter (this is because you are using a graphic as your header). Finally, don’t be shy about posting some targeted links on sites like CT for “Plato”. You can do this very organically as you discuss your book, I’m not suggesting anything drastic. In this post above, for example, not once do you link “Plato” to your own site, which is precisely what you should be doing. Sites like CT have strong Google juice and tend to come up high on search engine results. You might’ve even considered titling your post as “Plato, Matthew Effect & Search Results” since that way the post would have a higher likelihood of coming up on search for plato and then once people are here, they may link to your book page (again, especially if you had a “Plato” link to your book) plus presumably the link juice would accumulate. But in the least, I would include a link to your book site with the term “Plato” in this post (and any future ones you may decide to write on the topic:).
Eszter Hargittai 09.19.09 at 11:24 am
I was writing my comment as Dave Weeden posted his. We came up with similar conclusions. He did it by looking at some of the features of the sites with good rankings, I did it by looking at your site and what I consider missing strategies on it. I like the point about image names. I have no idea if they matter, but it seems reasonable to think that they might. Accordingly, you could rename your “ornamentwidemargins4.png” to “plato4.png” or something and the same with the other images on the site.
John Holbo 09.19.09 at 12:01 pm
“Accordingly, you could rename your “ornamentwidemargins4.png†to “plato4.png†or something and the same with the other images on the site.”
Ha! That one hadn’t occurred to me at all. And I didn’t know about how ‘here’ links are more opaque for the likes of Google, although that makes perfect sense. I haven’t really made a concerted effort to SEO my Plato stuff, although I’ve made fairly free with CT as a platform for talking it up. With my next post, I’ll do better with SEO!
Salient 09.19.09 at 2:35 pm
It’s interesting that Google doesn’t do better. (Not that I have any bright ideas.)
What Chris said, but — google already has implemented an optional “advanced search” feature that lets you restrict your search based on Date: (how recent the page is). This lets you dodge most of the Matthew effect pages, at the expense of also losing everything else.
Adding in a 5-year time span would be moderately helpful, and adding in a feature that let you search for pages updated within the past year instead of indexed for the very first time within the past year would be immensely helpful.
Salient 09.19.09 at 2:44 pm
And I didn’t know about how ‘here’ links are more opaque for the likes of Google, although that makes perfect sense.
Searching for “here” on google was rather fun. One gets:
* Download sites for file viewers and codecs (Adobe, RealPlayer, etc)
* Links to official “contact your congressperson” pages
* Only a few map sites
Seeing what shows up for a google search for here might actually be a good way to determine, to a first approximation, what websites are most popular (in the sense of most frequently linked to, not most visited).
Glen Tomkins 09.19.09 at 3:55 pm
Maybe it’s a Battle of the Books effect
Perhaps the search algorithms for topics that the algorithm designers believe don’t change over time are set to respect older material, while they have algorithms that give more prominence to new material for topics that are thought to be subject to progressively better or more correct information.
For example, if you’re searching for informatin about medical treatments, you want the very latest studies presented to you first. But if you want to know about Plato, well, you see how some algorithm designers might assume that people who want to know about Plato would at least set a lesser value on getting the latest information, and actually might be looking for the dustiest tomes available. In the sort of battle between ancient and modern learning that Swift envisioned, after all, our side is imagined as being literally combatant partisans of the very oldest stuff out there. According to this theory , you should look upon those high-ranked sites that you found on Plato as the “classics” in the field. What’s your beef, you don’t respect the classics?!
Not that it’s very likely that the folks at Google read their Swift, but it is possible that they bother to correct for the Matthew effect only for scientific or technical topics where they think that the last word is the best word, and just don’t bother for everything else. And it may be that they rely on humans applying a sanity test to clean up their most searched topics after the sort of thing you found that the algorithm by itself did for Plato, but they don’t bother with human review for low hit searches like Plato.
Henri Vieuxtemps 09.19.09 at 4:04 pm
What about those “Promote” and “Remove” buttons? At least (presumably) you can teach your google what results you do and don’t want to see.
ben 09.19.09 at 4:27 pm
The perseus project is practically unusable 50% or more of the time, so that might explain some part of its not being very high. Though it is the second main link if you search for a latin dictionary—no quotation marks, even!
Ted Lemon 09.19.09 at 5:56 pm
Why when you link to your Plato book is the text in the link “a book” instead of “a book about Plato?” How do you expect search engines to index it properly if the descriptive text for the link doesn’t say what the link is a link to?
Kenny Easwaran 09.19.09 at 8:43 pm
That’s interesting – I got a very different list of results from Salient on a search for “here”. Second was an LGBT television channel called HereTV. There were one or two other organizations whose name was basically “here”. Google Maps came up above Adobe and Media Player installation links. No links for contacting Congresspeople came up.
christian h. 09.19.09 at 9:51 pm
Not to offend the resident philosophers’ sensibilities, but any anomalies – like the “Matthew effect” – will loom larger if you search for a topic that isn’t very popular. If the total number of sites dedicated to it, and of links to such sites, is small then everything is obviously amplified.
yoyo 09.19.09 at 10:09 pm
if even a small fraction of google users downloaded a gear/widget/thingie that monitored for quick clicking of ‘back’ once you got onto a page, if you bookmarked a page, etc, and reported this, it could be good feedback on which sites were actually good. you’d need penalties for sites upranked by spammers though.
overall this problem seems about the same as trying to stamp out any silly meme that spreads though society. if you can’t get rid of people’s respect/talking about/sharing of astrology or whatever, how is a search engine going to distinguish good and bad writings on Plato?
Also, having wiki on top of results seems bad. or, there should be GooglePlus, which avoids things like links to wiki and mayo and whatever; if I search google for Plato, its because i want something different to what wiki says or what even waht “site:crookedtimber.com Plato” says. Maybe more accounting for quality, as opposed to volume/notoriety measures.
John Quiggin 09.20.09 at 12:25 am
SEO is easy, as you can see by Googling “romantically linked with Angelina Jolie”.
Witt 09.20.09 at 12:29 am
Kenny and whoever mentioned regional issues are on to something important — there isn’t a Google results page for a given topic; there are millions, perhaps tens of millions. I don’t even stay logged into my Gmail when I’m searching Google, and even I get remarkably different results when I search the same phrase from home versus work.
This phenomenon already leads to some semi-amusing, semi-frustrating disputes when, say, a group of people is trying to decide on a restaurant. “It’s the first hit that comes up on Google” is often not true for all users, and if the people involved don’t know what’s going on they may start to feel like they’re being gaslighted.
It also comes up in library work, particularly when a parent and child have done some preliminary online research at home and then come into the library to do more. Particularly for people who feel “tricked” if they aren’t getting the same page of results, it can be frustrating and confusing. (They want to know why it isn’t like old-time phone listings — that is, why aren’t they seeing the exact same listings in the library’s Google results as they would from their home computer, just as in the old days, the library’s phone books would have been identical to the local phone book they had at home.)
jholbo 09.20.09 at 2:26 am
“Not to offend the resident philosophers’ sensibilities, but any anomalies – like the “Matthew effect†– will loom larger if you search for a topic that isn’t very popular.”
But that’s why I chose Plato. There are actually a lot of sites and pages dedicated to this famous philosopher. I could have picked ‘intensional semantics’ if I wanted an unpopular topic.
andrew 09.20.09 at 4:28 am
Where I am, searching “plato holbo waring” returns Neil the Ethical Werewolf recommending the book just above the Amazon page. Searching just “plato holbo” returns this top result.
John Holbo 09.20.09 at 5:57 am
Hey thanks andrew, I didn’t actually even remember that that old version was still up. It’s years old. Gotta update.
John Holbo 09.20.09 at 5:59 am
It’s a good example of word order making a difference. ‘Plato holbo’ produces the irrelevant old link. ‘Holbo Plato’ does not.
Cryptic Ned 09.20.09 at 6:45 am
While “Halbo ploto” produces Susana Juarez’s Blog, “Plobo halto” gives us a poorly digitized Latin primer from 1884.
Lichen 09.20.09 at 7:24 am
For the “Plato” lover in you.
http://www.craigslist.org/about/best/chi/748263604.html
Patrick 09.21.09 at 2:55 am
Maybe a stochastic element to search results? Randomly putting lower ranked results into the top, so that they’ll get some views.
There is more to this than the Matthew Effect, though. Personalized results, can have the same effect of decreasing the diversity of your results. In extreme, all you’ll see is what Google’s model of you wants to see, which is necessarily different than you.
p.s. I still want to see your response to McArdle’s response:
http://meganmcardle.theatlantic.com/archives/2009/09/the_governments_role_in_rd.php
John Holbo 09.21.09 at 7:53 am
Hi Patrick, I gave a short response to that post in the Cosmopolitanism post. (I could elevate it to true post status, but perhaps people have had enough of that go-round.) Did you see this comment by me?
https://crookedtimber.org/2009/09/17/a-citizen-of-where-exactly/#comment-288731
Henri Vieuxtemps 09.21.09 at 8:25 am
In extreme, all you’ll see is what Google’s model of you wants to see, which is necessarily different than you.
But do you really know what you really want to see? Perhaps a good search engine should return three sets of results: one for your ideal ego, one for your ego-ideal , and the third one for your superego?
John Holbo 09.21.09 at 11:15 pm
There is also the id search, of course.
Patrick 09.23.09 at 5:51 am
I didn’t see that post, thank you for the link. I’ve certainly enjoyed the exchange, but I can understand why you might be getting bored with the discussion.
Comments on this entry are closed.