Wikipedia doubling time

by John Q on April 28, 2006

The English language version of Wikipedia had its one-millionth article on 8 March, and has recently passed 1.1 million, 50 days later. That gives an implied doubling time of about a year. The doubling time seems to be fairly stable, since the 500 000 mark was reached in March 2005, and 250 000 in April 2004.

A straightforward extrapolation gives a billion articles in 2016 (and a trillion in 2026). I planned to write something about this, but it seems much more appropriate to leave it to the collective wisdom of the blogosphere.

On a vaguely related point, thanks to commenters Matt Austern and others on my scale post who pointed me to Powers of Ten. I really like this kind of thing, so feel free to nominate more of the same.

Update over the fold

Lots of fun in the comments, here and at CT, and now I’ll try to be at least semiserious. If a trend can’t be sustained it won’t be. Or, if you prefer, exponential curve eventually become logistic. So, the real point is to work out the constraints on Wikipedia’s growth, and take a guess at what the endpoint of this process will look like.

It’s a pretty safe bet that the current phase of rapid growth will take Wikipedia to 10 million articles, if not within the 3.5 years implied by the recent exponential trend. There’s no shortage of topics, plenty of room for growth in the number of contributors, and no obvious problem handling such an expansion within the current software and scheme of social organization. The result would be a general reference system that would be better for that purpose than anything that’s been seen previously. Among other things, such a system would replace Google for many purposes (though not the ones that make most of Google’s money).

Going beyond that to 100 million articles would imply some radical changes. As far as content is concerned, something on this scale would compete across the board with specialist reference works like national dictionaries of biography, the Palgrave dictonary of economics and so on. An obvious way to approach this goal would be to subsume a number of existing projects, as was done with the 1911 Britannica, and the gazetteer entries on US towns. But that would certainly require both new organizational and licensing arrangements, and probably a more complex architecture than would exist at present. More importantly, it’s hard to see something on this scale functioning without a substantial number of full-time paid staff. On this scale, Wikipedia coverage of current events would also be directly competitive with the mainstream media.

Another order of ten multiplication and Wikipedia would be comparable in size with the Internet as a whole (the visible Internet currently has about a billion articles). As Joel Turnipseed pointed out in comments, the obvious analogy is with Borges map, which was on the same scale as the country it described. Extrapolations to this point and beyond are fun, but best left to SF.

My best guess is that sometime around 10 million articles, the growth rate will slow eventually becoming linear. At this point, either some other project will take off, or Wikipedia itself will transform into something radically different.

{ 1 trackback }

Darwiniana » Wikipedia: 04.28.06 at 6:26 pm

{ 30 comments }

1 arthur von bladet 04.28.06 at 4:10 pm: “Ford!” he said, “there’s an infinite number of monkeys outside who want to talk to us about this script for Hamlet they’ve worked out.”
2 Henry (not the famous one) 04.28.06 at 4:22 pm: [Insert joke about monkeys, typewriters and Hamlet here.]
–a recovering wikipedian
3 Kenny Easwaran 04.28.06 at 4:30 pm: Has the number of articles really been increasing approximately monotonically? I could imagine there being periods of reorganization where many articles get collected together, in which the number of articles could significantly decrease.
4 Kenny Easwaran 04.28.06 at 4:32 pm: Note that the extrapolation also suggests 1000 articles in 1996, and the first article in 1986. I suppose the period of doubling must have started around 2002 or 2003.
5 joel turnipseed 04.28.06 at 4:39 pm: Also in 2026, the completion of Borges’ map…
6 John Quiggin 04.28.06 at 4:51 pm: “Borges map …”

I actually had a go at this one, Joel, and you’re almost exactly right.

Wikipedia should have an article for every accessible Internet page some time around 2020 (bearing in mind that the net itself is still growing), and the invisible bits of the net are probably an order of magnitude more.

As with the map, at some point in this process, Wikipedia would effectively subsume the Internet.
7 inigo jones 04.28.06 at 4:56 pm: Is this intended as a reductio?
8 John Quiggin 04.28.06 at 4:57 pm: Appropriately, #1 and #2 were affected by a moderation timewarp.
9 Keith 04.28.06 at 5:15 pm: Does this estimate account for deletion and combining of old and related articles?
10 John Quiggin 04.28.06 at 5:25 pm: “Is this intended as a reductio?”

Not exactly. Clifford Stoll tried a reductio on the Internet back in 1995 Silicon Snake Oil when he said

Just by counting network nodes, the Internet is now doubling in size every year. This growth rate can’t continue for long – at this rate, everyone on Earth will be connected by 2003. Impossible!

Of course, it’s not literally true that everyone is connected, but still …

On the other hand, there are those razor blade extrapolations.
11 sz 04.28.06 at 7:52 pm: For more on wikipedia growth models see link:

http://en.wikipedia.org/wiki/Wikipedia:Modelling_Wikipedia%27s_growth
12 neil 04.28.06 at 8:09 pm: It’ll give out before it gets to a billion. I wonder what happens then?
13 Rich Crew 04.28.06 at 8:23 pm: A physicist friend of mine recalls a lecture in which the speaker, pointing out the ever-increasing publication rate of Physics Review Letters, was led to define the “shelf speed” of a journal. At its current rate of acceleration, the shelf speed of that particular journal should exceed the speed of light some time before 2020.
14 Omri 04.28.06 at 11:30 pm: Bacterial colonization of a petri dish looks exponential too, at first.
15 Tom Ames 04.28.06 at 11:45 pm: Why do people have such a hard time understanding that the logistic curve (a much better model for growth in an environment of finite resources) looks exponential in its early phases?

If you plot the number of bunny rabbits in a field vs. time, at first it would seem exponential. But does anyone really expect that a few years will result in a field packed with bunnies?

The parameters of a logistic process are hard to predict if all you have is the early portion of the curve. The exponential parameter, on the other hand, can be fit rather easily. This is a really bad reason to pretend that the behavior is truly exponential. But from Moore’s law to human population projections to the idiotic “singularity” argument, otherwise intelligent people (*cough* wikipedia) keep doing just that, leading to impossible (if “straightforward”) extrapolations.
16 John Quiggin 04.29.06 at 12:55 am: Pointing out that logistic curves look exponential at first is easy. The fun part (note the posting category) is to work out the upper bound.
17 Ajax 04.29.06 at 4:44 am: Given how many Wikipedia articles are straight copies from the 1911 edition of the Encyclopedia Britannica, perhaps the real constraint on the former’s growth is the size of the latter.
18 ~~~~ 04.29.06 at 4:46 am: The millionth article was created on March 1!
19 French Swede the Rootless Vegetable 04.29.06 at 4:47 am: ‘I really like this kind of thing, so feel free to nominate more of the same.’

Since people have mentioned Borges, I wonder when Wikipedia will be as comprehensive as the Library of Babel?
20 ~~~~ 04.29.06 at 5:19 am: Ajax – merging the 14,707 1911 Brittanica topics that did not have articles on wikipedia was completed on February 27.
21 Mike in Arkansas 04.29.06 at 7:37 am: While I have occasionally used Wikipedia, linked to Wikipedia articles, and even edited a couple of articles, until reading this post I had not really considered what Wikipedia really is and how it came to be. Wikipedia has an interesting article on itself that explains a lot of this.
22 stuart 04.29.06 at 10:09 am: EB 1911 edition had approximately 40,000 entries (a number which Wikipedia currently adds in a couple of weeks), so I doubt that could ever have been a limiting factor or even a particularly large boost to the project except very early on. The main advantage of EB 1911 in wikipedia is covering now obscure figures from the 19th century that otherwise there would otherwise likely be no article for.
23 Ajax 04.30.06 at 3:12 pm: On the contrary, Stuart, the Wikipedia entries lifted from EB 1911 are not confined to articles about obscure people in the 19th century, but seem to include lots of less obscure people from earlier centuries. If one is interested in tech subjects Wikipedia is fine, but not many Silicon Valley geek contributors to Wikipedia know all that much about Renaissance English Catholic recusant poets. For information about British people, the Oxford Dictionary of National Biography is generally a much better source of information (up-to-date, extensive, written by experts, checked, references included) than Wikipedia.
24 Tom T. 04.30.06 at 8:19 pm: By the way, Britannica responded quite sharply to the Nature article that concluded that Britannica made as many errors as Wikipedia.
25 John Quiggin 04.30.06 at 9:32 pm: Sharply, but unconvincingly. All they did was argue the toss on the calls that went against them. They didn’t make a case for systematic error in the Nature study.
26 Sharon 05.01.06 at 3:48 am: Comparisons to ODNB hardly seem appropriate. Wikipedia didn’t cost 25 million quid to produce, and it won’t set you back 200 UK pounds (or 300 USD) a year for a personal subscription.

On the other hand, you should read some of the complaints about errors in the new ODNB…
27 Ajax 05.01.06 at 6:57 am: Open source production is fine if the topic you are seeking is something which interests the open-source producing community. If not, and if you want something for free, then you have to endure Wikipedia’s wholesale copy of the out-of-date 1911 EB when looking, for instance, for information about Elizabethan English poets. Comparing Wikipedia to ODNB, you get what you pay for.
28 abb1 05.01.06 at 7:38 am: So, why aren’t the non-English wikipedias growing as dramatically as the English version? According to this graph it looks like the Geman version may have already leveled.
29 dipnut 05.01.06 at 2:46 pm: I had noticed it was getting harder and harder to read them all.
30 Hamilton Lovecraft 05.02.06 at 10:06 am: Among other things, such a system would replace Google for many purposes

Wikipedia has already replaced Google as my first go-to for answering questions of the form “what the heck is X, exactly?”, but Google was never particularly good at that question, instead directing me to pages which answer the question “what does some loon with no sense of design have to say about X?”

I don’t see Wikipedia being able to cut much further into Google’s turf than that, though.

Comments on this entry are closed.

Wikipedia doubling time

Recent Comments

Search

Archives

Pages

Book Events

Contributors

Fine Print

Lumber Room

Old Wood

Meta

Recent Posts

Tags