Punycode with PHP
When the Domain Name System (DNS) was designed, nobody thought of it that eventually other characters than the default [a-z0-9-] will be needed. In 2003, the first top-level domains (TLD) - .info, .jp, .kr, .lt, .pl, .se, ... - started to expand their character set. By now you can register IDN domains with all large TLDs (cnobi / dcnobi) and many Country-code TLDs.
But what is IDN? An internationalized domain name (IDN) is a domain name contains non-ASCII characters. The implementation is on the client side. The Server gets standard ASCII strings for the domain name. To encode the domains a simple algorithm called Punycode is used. The string "löl" sees coded as follows: "xn--ll-fka". Characteristical is that the string always begin with "xn--".
The one or the other might think that he simply writes a script, which converts a Unicode string in his Punycode equivalent. Who want's to do that, can do. The Punycode is defined in RFC 3492. But it is much easier with a library. As for everything there is somewhere something prefabricated. There is the GNU libidn library which do this task for us. I wrote a simple wrapper to use libidn with PHP.
The PHP extension adds two new functions:
<?php
echo idna_toASCII('xàrg.örg')."\n";
echo idna_toUnicode('xn--xrg-9ka.xn--rg-eka');
?>
The output looks like this:
xn--xrg-9ka.xn--rg-eka xàrg.örg
The idea and the syntax is mainly based on the Java Version.
November 17th, 2008
November 1st, 2008