Wandering around the blogosphere, I came across this rather interesting page. It seems to be a little outdated, but it provides an approximate count of the relative importance of different languages in the blogosphere. English comes first, unsurprisingly, then French. Portuguese is third, and Farsi fourth. This may seem a little surprising to those who aren’t familiar with the proliferation of Portuguese and Farsi blogs - both linguistic communities have also made substantial inroads into social network services like Orkut.com too. This leads to an interesting sociological question - why these communities and not other linguistic communities of similar size - have reached takeoff in the blogosphere. Equally interesting is the lack of any Arab language blogs on the list. This may be a result of how the authors have seeded their survey or parsed their results - but it may also quite possibly reflect reality. As far as I know, there are less than 70 Iraqi blogs (many of which are in English). I’m not aware of any substantial blogging communities in other Arabic-speaking countries - but I’m happy to be enlightened if I’m wrong. The root causes may perhaps include cultural factors - but I would bet that restrictions on Internet access and poor technological infrastructures also play a very important role.
Most of the Portuguese blogs, I imagine, are located in Brazil; one of Blogger’s first deals was with a Brazillian company.
Interesting that so many Arab blogs are in English.
http://hammorabi.blogspot.com/archives/2004_08_01_hammorabi_archive.html#109243633379101445
is a good example, though it is an interesting English.
I think it may have to do with the availability of native language typefaces and programs and a desire to reach outside of one’s own culture.
The French feel no need. Most Arabic English Language blogs intend to reach beyond Arabs.
Arabs aren’t making big inroads into the net for several reasons. Arabic dialects are not mutually intelligible, and Modern Standard Arabic sucks. Connectivity isn’t great, but censorship is, and Arabic social mores also play in. You might not speak candidly on a blog about much of anything if you’re afraid that a loose comment will cost your cousin his promotion in the civil service.
Wouldn’t network effects have something to do with it? If you’re an Iraqi educated enough to start a blog, and you’re interested in blogs, you probably have decent English. There are a lot of good reason for you to blog in the same language as most other blogs out there, particularly blogs by citizens of the countries currently occupying Iraq.
Although they give us all sorts of details about their methodology, I still feel like we don’t know enough to try to make too much of the findings. They talk about a “certainly level” attached to the sites they crawl as to whether they are really blogs or not, but it’s not fully clear how that certainly level may (or may not) be included in calculations of general blog stats and classifications (e.g. language groupings). I wonder if we randomly picked cases from their list how many we’d end up classifying as blogs (setting aside the issue that it’s nearly impossible to come up with a common agreement about what constitutes a blog).
German and French rank quite high, but is that at least in part thanks to having blog lists like swissblogs.com as starting points?
Also, it seems one may want to take into consideration the number of Internet users who speak these languages or at least number of users from countries where the various languages are commonly used to get a per capita type of score. (Sure, sheer numbers are interesting and relevant as well, but if you want to explain why one language is more represented than another, supply of possible bloggers would be relevant.) Of course, having done research in this area, I know how difficult it is to come up with such figures, but I still think it’s important to be conscious about the issue.
I think digamma makes a good point in that depending on your intended audience, you may decide to blog in a language other than your own so language of a blog does not equal primary language of blogger (not that anyone was arguing this, I guess;).
Also, regarding network effects, since following links was part of the methodology to find other blogs (very reasonable, of course), you’re likely going to find more of the same language blogs since those in language x will likely link to other blogs in language x. This was the point I was making above regarding the possible effects of something like swissblogs.com.
I realize the people behind the project have done a lot to circumvent relying solely on links from blogs they already know about, but this like-links-to-like issue could still have implications for what blogs are found and included in the stats in the end.
The Farsi end of the blogosphere is no doubt being fed by the huge dissident population in Iran.
Are most of the Portugese-language blogs in Portugal or in South America? What percentage of French-language blogs are outside France?
The methodology is broken, I believe. They rely on a Perl script called “textcat”, which I believe has not been updated since 1998. The textcat site states that some languages are only supported in certain encodings. The ensuing confusion between the language and the character encoding is show clearly by including Chinese twice: - once as “Chinese-gb2312” and once as “Chinese-big5”.
A better methodology would be to take notice of the character encoding markup (which the poll does not do), translate it to a common Unicode base (which should represent everything!), and then do the trigraph analysis there. You could do it with Perl, but Python is more international-friendly.
At least the methodology is not as broken as the map. Frankly, I never knew there were so many people blogging from the Somalian continental shelf. Now I do.
Alas, it looks like their data is completely wrong. The low scores for Japanese and Chinese should throw most readers into doubt.
According to Global Reach statistics (in millions speakers):
- English 230
- Non-English : 403
- incl . Chinese : 40
- incl. Japanese : 61
- incl. Spanish : 41
These results are far more logical :broad and cheap access to the Internet was delivered far more early in Japan than in France. The demographic potential of China is huge, while French represents part of Switzerland, part of France (digital divide), and Quebec.
Those stats come from the following WIPO report. You can find them in other UNCTAD reports.
http://ecommerce.wipo.int/survey/html/1.html
I think Eszter and Digamma’s concerns are to be taken seriously - the concerns about Unicode are real (although I think it’s fair enough to report Chinese twice, if two different forms of character coding are involved). I also suspect that Chinese has experienced an enormous surge since the survey was conducted. However, I simply don’t see how the WIPO survey proves or disproves anything - it’s not concerned with bloggers. Also, I think that the findings for high representation of Farsi and Portuguese are very likely robust. If you look at the Technorati top 100, they are pretty well the only foreign languages represented. Further, as I mentioned in the post, Orkut.com has been disproportionately colonized by Portuguese and Farsi speakers. This suggests to me that as one of the commenters noted, uptake of blogging is largely a community thing - if there is a s ubstantial community of individuals out there who you can speak to, it becomes a self perpetuating positive feedback driven phenomenon. And it is interesting that,say, Portuguese bloggers have a bigger presence in the blogosphere than German language bloggers. Some of it is clearly due to early innovations in Unicode - Hossein Derakashan’s early Farsi adaptation of unicode played a huge role in getting Iranians to start blogging.
The discussion of Arabic language blogging in the comments is also quite interesting. I frankly suspect that cultural factors are less important than poor access to infrastructure, and the risks of government punishment for either (a) ‘political’ blogging, or (b) ‘personal’ blogging by people with active sex lives outside of marriage etc, are the key variables. Also government filters which prevent access to blogspot, typepad and other popular blogging tools.
This is anecdotal but intriguing. I have been told that one reason for the prevalance of Portuguese on the internet is the strong contact maintained between former Portuguese colonies. And it may be that a lot of Farsi blogging is within the US as the community develops its interconnections.
The WIPO stats do not focus on blogs, indeed - but don’t you find it strange that Asian languages score so bad ?
The difference in alphabets is hiding many blogs from Technorati, and many pages from Google. The Internet was built using Latin-Roman and although unicode and new algorithms are meant to help all languages coexist, there still is a gap between Cyrillic, Hellenic, Asian and Western encodings.
The Portuguese colonies is a nice and curious anecdote (although I’m not sure Angola and Mozambique blog a lot, plus the digital divide is terrible in Brazil).
Just trying to point out the survey exposed here builds on technical mistakes that can be at least spotted, at best proved, by a bit of logic. The Asian community (which is extremely walled off, on the Web as IRL) is underestimated, I do not (cannot) believe the numerical proxies given by the survey.
Speaking of which…
Babel is currently expanding to be able to offer content and threaded discussions in over 400 languages on the site, which means building 400 brand new blog sections, each one which will be hosted by Authors, Editors and Managers in their own native language for our progressive academic journal and eventual online university. The main idea is to simply take the English-language prototype of Babel) and have different language versions of the site with the same goals and aspirations as the original English-speaking version.
For more information, go here:
http://towerofbabel.com/map/authors.pl
Malcolm Lawrence
Editor-in-Chief
Babel: The multilingual, multicultural
online journal and community of arts and ideas.
http://www.towerofbabel.com
——————————————————————————————
Babel: Where the vodka is strong but the meat is rotten.
——————————————————————————————
More evidence for the irrelevance of the WIPO stats in this context: Japan doesn’t have widespread cheap web access in the sense that a US or UK observer would understand. The popular medium is via mobile handsets, with landline access a minority sport.
This makes the lack of blogging unsuprising - blogs aren’t a good small-screen crap-keypad medium. While I’ve made one or two blog posts via my cellphone, this is more for curiousity than ease of use.
Nobody has mentioned Russian yet, but I am told (I don’t savvy the lingo) that all the Russkophones are on LiveJournal, and that this is usually beneath the radar for these kinds of undertakings.
Japan’s “minority sport” : http://www.stat.go.jp/english/data/handbook/c08cont.htm
Check Figs 8.6/8.7.
À Gauche
Jeremy Alder
Amaravati
Anggarrgoon
Audhumlan Conspiracy
H.E. Baber
Philip Blosser
Paul Broderick
Matt Brown
Diana Buccafurni
Brandon Butler
Keith Burgess-Jackson
Certain Doubts
David Chalmers
Noam Chomsky
The Conservative Philosopher
Desert Landscapes
Denis Dutton
David Efird
Karl Elliott
David Estlund
Experimental Philosophy
Fake Barn County
Kai von Fintel
Russell Arben Fox
Garden of Forking Paths
Roger Gathman
Michael Green
Scott Hagaman
Helen Habermann
David Hildebrand
John Holbo
Christopher Grau
Jonathan Ichikawa
Tom Irish
Michelle Jenkins
Adam Kotsko
Barry Lam
Language Hat
Language Log
Christian Lee
Brian Leiter
Stephen Lenhart
Clayton Littlejohn
Roderick T. Long
Joshua Macy
Mad Grad
Jonathan Martin
Matthew McGrattan
Marc Moffett
Geoffrey Nunberg
Orange Philosophy
Philosophy Carnival
Philosophy, et cetera
Philosophy of Art
Douglas Portmore
Philosophy from the 617 (moribund)
Jeremy Pierce
Punishment Theory
Geoff Pynn
Timothy Quigley (moribund?)
Conor Roddy
Sappho's Breathing
Anders Schoubye
Wolfgang Schwartz
Scribo
Michael Sevel
Tom Stoneham (moribund)
Adam Swenson
Peter Suber
Eddie Thomas
Joe Ulatowski
Bruce Umbaugh
What is the name ...
Matt Weiner
Will Wilkinson
Jessica Wilson
Young Hegelian
Richard Zach
Psychology
Donyell Coleman
Deborah Frisch
Milt Rosenberg
Tom Stafford
Law
Ann Althouse
Stephen Bainbridge
Jack Balkin
Douglass A. Berman
Francesca Bignami
BlunkettWatch
Jack Bogdanski
Paul L. Caron
Conglomerate
Jeff Cooper
Disability Law
Displacement of Concepts
Wayne Eastman
Eric Fink
Victor Fleischer (on hiatus)
Peter Friedman
Michael Froomkin
Bernard Hibbitts
Walter Hutchens
InstaPundit
Andis Kaulins
Lawmeme
Edward Lee
Karl-Friedrich Lenz
Larry Lessig
Mirror of Justice
Eric Muller
Nathan Oman
Opinio Juris
John Palfrey
Ken Parish
Punishment Theory
Larry Ribstein
The Right Coast
D. Gordon Smith
Lawrence Solum
Peter Tillers
Transatlantic Assembly
Lawrence Velvel
David Wagner
Kim Weatherall
Yale Constitution Society
Tun Yin
History
Blogenspiel
Timothy Burke
Rebunk
Naomi Chana
Chapati Mystery
Cliopatria
Juan Cole
Cranky Professor
Greg Daly
James Davila
Sherman Dorn
Michael Drout
Frog in a Well
Frogs and Ravens
Early Modern Notes
Evan Garcia
George Mason History bloggers
Ghost in the Machine
Rebecca Goetz
Invisible Adjunct (inactive)
Jason Kuznicki
Konrad Mitchell Lawson
Danny Loss
Liberty and Power
Danny Loss
Ether MacAllum Stewart
Pam Mack
Heather Mathews
James Meadway
Medieval Studies
H.D. Miller
Caleb McDaniel
Marc Mulholland
Received Ideas
Renaissance Weblog
Nathaniel Robinson
Jacob Remes (moribund?)
Christopher Sheil
Red Ted
Time Travelling Is Easy
Brian Ulrich
Shana Worthen
Computers/media/communication
Lauren Andreacchi (moribund)
Eric Behrens
Joseph Bosco
Danah Boyd
David Brake
Collin Brooke
Maximilian Dornseif (moribund)
Jeff Erickson
Ed Felten
Lance Fortnow
Louise Ferguson
Anne Galloway
Jason Gallo
Josh Greenberg
Alex Halavais
Sariel Har-Peled
Tracy Kennedy
Tim Lambert
Liz Lawley
Michael O'Foghlu
Jose Luis Orihuela (moribund)
Alex Pang
Sebastian Paquet
Fernando Pereira
Pink Bunny of Battle
Ranting Professors
Jay Rosen
Ken Rufo
Douglas Rushkoff
Vika Safrin
Rob Schaap (Blogorrhoea)
Frank Schaap
Robert A. Stewart
Suresh Venkatasubramanian
Ray Trygstad
Jill Walker
Phil Windley
Siva Vaidahyanathan
Anthropology
Kerim Friedman
Alex Golub
Martijn de Koning
Nicholas Packwood
Geography
Stentor Danielson
Benjamin Heumann
Scott Whitlock
Education
Edward Bilodeau
Jenny D.
Richard Kahn
Progressive Teachers
Kelvin Thompson (defunct?)
Mark Byron
Business administration
Michael Watkins (moribund)
Literature, language, culture
Mike Arnzen
Brandon Barr
Michael Berube
The Blogora
Colin Brayton
John Bruce
Miriam Burstein
Chris Cagle
Jean Chu
Hans Coppens
Tyler Curtain
Cultural Revolution
Terry Dean
Joseph Duemer
Flaschenpost
Kathleen Fitzpatrick
Jonathan Goodwin
Rachael Groner
Alison Hale
Household Opera
Dennis Jerz
Jason Jones
Miriam Jones
Matthew Kirschenbaum
Steven Krause
Lilliputian Lilith
Catherine Liu
John Lovas
Gerald Lucas
Making Contact
Barry Mauer
Erin O'Connor
Print Culture
Clancy Ratcliff
Matthias Rip
A.G. Rud
Amardeep Singh
Steve Shaviro
Thanks ... Zombie
Vera Tobin
Chuck Tryon
University Diaries
Classics
Michael Hendry
David Meadows
Religion
AKM Adam
Ryan Overbey
Telford Work (moribund)
Library Science
Norma Bruce
Music
Kyle Gann
ionarts
Tim Rutherford-Johnson
Greg Sandow
Scott Spiegelberg
Biology/Medicine
Pradeep Atluri
Bloviator
Anthony Cox
Susan Ferrari (moribund)
Amy Greenwood
La Di Da
John M. Lynch
Charles Murtaugh (moribund)
Paul Z. Myers
Respectful of Otters
Josh Rosenau
Universal Acid
Amity Wilczek (moribund)
Theodore Wong (moribund)
Physics/Applied Physics
Trish Amuntrud
Sean Carroll
Jacques Distler
Stephen Hsu
Irascible Professor
Andrew Jaffe
Michael Nielsen
Chad Orzel
String Coffee Table
Math/Statistics
Dead Parrots
Andrew Gelman
Christopher Genovese
Moment, Linger on
Jason Rosenhouse
Vlorbik
Peter Woit
Complex Systems
Petter Holme
Luis Rocha
Cosma Shalizi
Bill Tozier
Chemistry
"Keneth Miles"
Engineering
Zack Amjal
Chris Hall
University Administration
Frank Admissions (moribund?)
Architecture/Urban development
City Comforts (urban planning)
Unfolio
Panchromatica
Earth Sciences
Our Take
Who Knows?
Bitch Ph.D.
Just Tenured
Playing School
Professor Goose
This Academic Life
Other sources of information
Arts and Letters Daily
Boston Review
Imprints
Political Theory Daily Review
Science and Technology Daily Review