Punycode with PHP

July 20th, 2008

When the Domain Name System (DNS) was designed, nobody thought of it that eventually other characters than the default [a-z0-9-] will be needed. In 2003, the first top-level domains (TLD) - .info, .jp, .kr, .lt, .pl, .se, ... - started to expand their character set. By now you can register IDN domains with all large TLDs (cnobi / dcnobi) and many Country-code TLDs.

But what is IDN? An internationalized domain name (IDN) is a domain name contains non-ASCII characters. The implementation is on the client side. The Server gets standard ASCII strings for the domain name. To encode the domains a simple algorithm called Punycode is used. The string "löl" sees coded as follows: "xn--ll-fka". Characteristical is that the string always begin with "xn--".

The one or the other might think that he simply writes a script, which converts a Unicode string in his Punycode equivalent. Who want's to do that, can do. The Punycode is defined in RFC 3492. But it is much easier with a library. As for everything there is somewhere something prefabricated. There is the GNU libidn library which do this task for us. I wrote a simple wrapper to use libidn with PHP.

Go to PHP IDNA Project Page

The PHP extension adds two new functions:

<?php

echo idna_toASCII('xàrg.örg')."\n";

echo idna_toUnicode('xn--xrg-9ka.xn--rg-eka');

?>

The output looks like this:

xn--xrg-9ka.xn--rg-eka
xàrg.örg

The idea and the syntax is mainly based on the Java Version.

Sorry, comments are closed for this article. Contact me if you want to post an inventive contribution.