raw code

Punycode with PHP

Robert Eisele

When the Domain Name System (DNS) was designed, nobody thought of it that eventually other characters than the default [a-z0-9-] will be needed. In 2003, the first top-level domains (TLD) - .info, .jp, .kr, .lt, .pl, .se, ... - started to expand their character set. By now you can register IDN domains with all large TLDs (cnobi / dcnobi) and many Country-code TLDs.

But what is IDN? An internationalized domain name (IDN) is a domain name contains non-ASCII characters. The implementation must be done on the client side. The Server gets standard ASCII strings for the domain name. To encode the domains a simple algorithm called Punycode is used. The string "löl" will become: "xn--ll-fka". Characteristic is that the string always begin with "xn--".

Of course you can build a simple script which converts a Unicode string in its Punycode equivalent. The Punycode is defined in RFC 3492. But it is much easier with a library. As for almost anything there is something prefabricated. There is the GNU libidn library which will do this task for us. I wrote a simple wrapper to use libidn with PHP.

Go to PHP IDNA Project Page

The PHP extension adds two new functions:

<?php

echo idna_toASCII('xärg.örg'), "\n";

echo idna_toUnicode('xn--xrg-9ka.xn--rg-eka');

?>

The output looks like this:

xn--xrg-9ka.xn--rg-eka
xärg.örg