Yesterday, one of the biggest events in the history of the Internet took place; non-Latin top-level domains went live in the DNS root zone. In plain English, you can now type the whole of a domain name in Arabic script. Not just the left of the dot (as in dot org) but the right of it, too. The three new top-level domains are السعودية. (“Al-Saudiah”), امارات. ( “Emarat”) and مصر. (“Misr”). They are country code names in Arabic for Saudi Arabia, United Arab Emirates and Egypt.
How did this happen? Years of collaboration and cooperation between countless technical, policy and linguistic experts around the world, endless patience and a fair amount of justified and motivating impatience for people to be able to use their own scripts and thus languages to access the Internet.
As Tina Dam, who leads ICANN work on internationalising domain names puts it, credit goes to the “registries and governments that have worked actively locally; the IDNA protocol authors; the policy makers; application developers” such as browsers who had to figure out how to make the url field read from right to left, and many, many more.
As my old IANA colleague, Kim Davies, says; the hard work and collaboration required to get this far is just the beginning. The people behind these new domains now need to work with their own communities to populate them. Browsers like Firefox don’t seem to have caught on yet, though they’ve had plenty of warning. And many more script and language groups are lining up behind to get their own characters into the root. Word is the Russians want Cyrillic in next (Medvedev got his game face on when he heard the Bulgarians might get there first.).
It’s hard to express what a big deal this is, and what a great day yesterday was for the Internet. The changes will be mostly invisible to most of us who speak English or type in languages that just use ASCI. While there have always been workarounds at the browser or ISP level that make it seem to many users in, say, China, that they’re typing everything in their own characters, not everyone has these workarounds. From now on, people who use other scripts will be able to access the world wide web on their own terms.
Is there any down side to this true internationalization of the domain name system? Sure. Every script that gets used has to go through years of mapping and testing characters, an enormous effort for a community that means it will take years to get to everyone. In the shorter term, there may be an increased risk of phishing domains that look like latin script, e.g. citibank.com, but actually contain a non-Latin character and bring you somewhere altogether less savoury.
There is also the fear that this will lead to the balkanization of the Internet. Will it, for example, make it even easier for the Chinese to maintain and strengthen their firewall? But anyone who tries to play the ‘gotcha’ card on these issues should first familiarise themselves with the years of expert discussions and decision-making that have gone into this development. Internationalisation is not perfect, and foreseeing problems isn’t the same as solving them. But anyone who complains in English that now they can’t type in every character of every url from their own machine should probably sit on that for a moment and think how it sounds to someone speaking, say, Amharic or Thai.
A point worth noting, these new domains are new versions of country codes like .UK or .AU and are run by country code managers. The wider process to open up generic top level domains like ‘.arab’ in Arabic or in plain old latin script still has quite a way to go, but should ultimately include non-ASCII characters.
Yesterday was a great day for celebration. So why isn’t the world celebrating? Because hardly anyone knows yet. Apart from a 100+ word announcement on the ICANN home page, leading to an announcement elsewhere and a couple of good blog posts by Tina Dam and Kim Davies, it’s as if this huge event hasn’t even happened. I found out via my grapevine on Twitter. Stories are just beginning to appear in the mainstream media, but not remotely on a scale that would indicate that getting the good news out to the world is being prioritised by ICANN’s leadership.
Where is the press campaign, news release, front page video, assault on social media and general drum-beating, trumpet-blaring celebration of the biggest thing on the Internet since email was invented? How can it be nearly two days since a fundamental change to how the world accesses the Internet and this news is still mostly getting around via tweets amongst insiders like me?
Wake up, ICANN! Get your communications act together. Just because this isn’t big news in America doesn’t mean it’s not big news to the world. If I were the Arab countries involved in this leap forward, I’d be confused and perhaps a little peeved.
When ICANN’s biggest news story of last year kicked off – the relinquishing by the US Department of Commerce of a legal claim of authority over how ICANN operates, in favour of a more international approach – the story was launched on the front page of the website with videos, press release, and endless quotes from big cheeses around the world saying how important it all was. It takes a lot of ground work to pull that off, but it was a priority, so it got done in time. An aggressive international press campaign put the story into mainstream media around the world. It was a big deal that looked and sounded like a big deal.
This time round, not-quite-silence but not a whole lot of parades and bunting either. It’s not as if internationalised domain names (IDN) came as a surprise. The process was approved by the Board in October, first applications came in November, and last week the Board gave the final go ahead to put the names in the root. I can’t help thinking that if IDN was a big domestic US issue, it wouldn’t be getting the manana treatment from ICANN’s leadership.
To be clear, I don’t think this is being buried, just not prioritized as its global significance demands. For example, I see a couple of press stories on the media page as of late Thursday night, US east coast time, but no press release, no interview with the CEO, no 2-pager for the technically challenged, no pull-quotes, no examples of how ordinary Internet users will be affected, no high flung rhetoric about how this makes the Internet truly international (for some of that, try Kieren McCarthy’s blogpost), no explanations of, to quote Vice President Joe Biden, what a big effing deal this is.
How could this happen? Perhaps it’s because ICANN’s leadership has become so narrowly US-focused. There is no longer a non-American on the dwindling executive team, and the possession of a less parochial sensibility is no longer an asset. The exodus of experienced, knowledgeable executives with deep ties to the international Internet community is beginning to show. ICANN is soon to lose its universally respected COO, Doug Brent. He will follow the head of international outreach, Theresa Swinehart, whose departure leaves the organisation with one less known, respected and deeply connected member of the global Internet community.
The open rebellion by country code managers and seething dissent of many government representatives during ICANN’s recent meeting in Kenya show an ICANN leadership that is out of touch with the vast majority of the global Internet community, and doesn’t seem to care. This needs to be fixed before it is too late.
The Internet changed yesterday thanks in part to the organisational and operational credibility of ICANN, its relationships around the world, and the open and collaborative model of technical decision-making pioneered by the Internet Engineering Task Force. A human web built and sustains the world wide web. It took years to build up. It could only take months to damage beyond repair.
Full disclosure: I used to work at ICANN as part of the communications team before I was made redundant last year following a change of leadership.
{ 28 comments }
Ginger Yellow 05.07.10 at 3:13 am
Zimmer covered it in detail at Language Log, and I first saw heard about it on the front page of the BBC News website.
Substance McGravitas 05.07.10 at 3:22 am
Thanks for the post. This is great for local populations. I’m personally kind of sad, because international browsing was made much easier by looking at English URLS and fiddling with them accordingly, but I’ll have to learn a little more and that’s not a bad thing. Or…maybe Google Translate will make me some pretend URLs to go with the pretend page content…
Vance Maverick 05.07.10 at 5:27 am
I too saw this on Language Log first. When that post went up, the DNS entries hadn’t propagated to my office yet — now their link works (though as they say, it gets translated automatically to an ASCII form).
des von bladet 05.07.10 at 5:46 am
I saw the story on the BBC news front page, like Ginger. One might reasonably imagine it is bigger news outside the latinosphere?
The BBC story (like this post) neglects to answer the only question I have about this, namely, is it UTF-8 under the hood or what?
Scott Martens 05.07.10 at 5:54 am
I noticed. The eternally incomplete and very frustrating unicode revolution moves one step forward today.
Ezroo 05.07.10 at 6:40 am
I’m still waiting for international standards that prevent any web page’s apostrophes from showing up as a-circumflex, euro symbol, trademark symbol. No one ever intentionally puts those three together in a row and yet I come across some site or another littered with them several times a week.
JulesLt 05.07.10 at 6:58 am
Agreed – I saw it on the BBC front page – on the same day as the UK election – it was the number one Technology story. Would have been nice to see some Google Art for it too, perhaps?
Alan Levin 05.07.10 at 7:07 am
No ways you could be made redundant :) you were an awesome asset to ICANN and clearly the communications area was made redundant and not you.
Stuart 05.07.10 at 9:27 am
I have heard quite a lot about this over the last year or so, the one thing that intrigued me but I never saw covered (although I would presume somewhere some group had to consider at length) is the possible issues with the spoofing of domain names, although clearly this isn’t so much an issue with arabic characters, but more with Cyrillic and the many European languages with diacritics.
Nick Barnes 05.07.10 at 9:40 am
Anti-spoofing is the browser’s job. See, for instance, http://www.mozilla.org/projects/security/tld-idn-policy-list.html
So Firefox, for instance, will not display decoded IDNs in .com. Right now, because these TLDs are so new, it also won’t display decoded IDNs in them. It has configuration parameters to fix this, and a future release will have those set appropriately. In the meantime, you can set them by hand (go to about:config and create new parameters called things like network.IDN.whitelist.xn--wgbh1c — for Egypt, مصر — set to true).
Gareth Rees 05.07.10 at 10:24 am
This is great for local populations
Perhaps you didn’t mean it that way, but it seems a bit patronizing to dismiss billions of readers of Arabic, Chinese, Devanagari, Thai, etc as “local populations”.
Fr. 05.07.10 at 12:34 pm
That’s the same interpretation trap as when we use “regional integration†to speak of supranational phenomena. The term ‘local’ has a different, neutral meaning for Internet stuff. Localization, for instance, is positively connoted (making the Web or applications and the like available to different languages is perceived as a positive effort).
Pete 05.07.10 at 12:52 pm
I would expect the news of this to be mostly circulated in the non-ASCII press..
alex 05.07.10 at 12:56 pm
Ah, the paradox of language – does it unite, or divide; is it for communication, or identification? Shibboleth, sibboleth…
a.y.mous 05.07.10 at 1:10 pm
This is good. But I don’t think this will do much good. As has been stated, you can now use different scripts. And as mentioned, Arabic, Chinese, Devanagari, Thai all have different scripts. But IME, non Latin Script data entry is woefully inadequate and completely non standardised, not to mention the pain of figuring out the correct keyboard to purchase. And not just the 3 for Japanese. There are 3 for almost all other scripts as well.
Unicode is a read only system.
Substance McGravitas 05.07.10 at 1:16 pm
Yes, poorly written.
Gareth Rees 05.07.10 at 1:26 pm
But IME, non Latin Script data entry
Pun intended?
(IME = “Input Method Editor” as well as “in my experience”)
Moby Hick 05.07.10 at 1:35 pm
Meh. Call me when I can use Wingdings.
a.y.mous 05.07.10 at 1:35 pm
Very much :-)
A few years ago, I had quite a comfortable revenue model when I had the opportunity to script a “Font Changer” in MS Word for a University dept. which used to get documents submitted in 6 different keyboard layouts (well, 4, but three fonts had subtle differences) across 20 odd fonts. It was revelation to me.
novakant 05.07.10 at 5:12 pm
I want Umlauts!
Substance McGravitas 05.07.10 at 5:30 pm
URLs from Sri Lanka are going to be lovely.
ben 05.07.10 at 8:26 pm
Browsers like Firefox don’t seem to have caught on yet, though they’ve had plenty of warning.
On purpose, as Nick points out in #7 above.
Aidan Kehoe 05.08.10 at 10:28 am
Des, it’s not UTF-8. See DJB for an explanation of the stupidities of it; the IDNA he’s criticising in 2002 still has exactly the same stupid problems eight years later. But it’s what we have.
Aidan Kehoe 05.08.10 at 11:15 am
Novakant, Umlauts work: http://www.süddeutsche.de/
Tangentally, the plural of Umlaut in German is, unfortunately, Umlaute, not Umläute. What a shame.
Phio 05.09.10 at 1:02 am
Hello,
I have been using IDNs for over 3 years now. (usually in .com and .net)
I thought I would reply to a few issues re: comments.
Firstly, IDNs before the dot have been working since 2001 and almost all language scripts are included.
As far as keyboards go and IME, these have also been used for the last 10 years with great success in most non-latin countries (most countries have their own keyboards ). Google has inserted a virtual keyboard in the local language for many countries: see http://google.ae or http://google.co.th and notice the keyboard next to the search bar, click it and you get a nice built-in virtual keyboard. Next, with the question of spoofing, there will always be those who attempt spoofing, this problem has existed for a while, and cyrillic characters have been available for several years for .com and .net. Any spoofing domains were registered quite a while ago, but the new IDN.IDNcctlds have extra security methods built in both at the registrar and with new delegation rules. Because of the extra security, we are hoping that Mozilla will change their whitelisting policies to support IDNs as other browsers have.
I feel this is a very good thing for the internet, and if it didn’t happen (IDN.IDN), we would have seen a splintering of the internet. All of us knew that if ICANN didn’t hurry, China would have probably created their own internet. So this is a good thing for the world. You can take a look at my blog at http://www.IDNBusiness.com and see some of the testing I’ve done with various IDNs in creating websites.
Regards,
Phio
novakant 05.09.10 at 2:45 am
Yeah, but it is converted to “sueddeutsche”, and I anglicized the plural of Umlaut, because otherwise non-German speakers might not know what I was talking about, also just for fun.
novakant 05.09.10 at 2:47 am
I like your “Umläute” (like “Häute”) though.
des von bladet 05.09.10 at 6:23 pm
Thanks, Aidan. I look forward to having non-punycode cleanliness become yet another source of (usually minor) irritation on the Intertubes.
Comments on this entry are closed.