Robert Eisele
Systems Engineer, Architect and DBA

Punycode with PHP

When the Domain Name System (DNS) was designed, nobody thought of it that eventually other characters than the default [a-z0-9-] will be needed. In 2003, the first top-level domains (TLD) - .info, .jp, .kr, .lt, .pl, .se, ... - started to expand their character set. By now you can register IDN domains with all large TLDs (cnobi / dcnobi) and many Country-code TLDs.

But what is IDN? An internationalized domain name (IDN) is a domain name contains non-ASCII characters. The implementation must be done on the client side. The Server gets standard ASCII strings for the domain name. To encode the domains a simple algorithm called Punycode is used. The string "löl" will become: "xn--ll-fka". Characteristic is that the string always begin with "xn--".

Of course you can build a simple script which converts a Unicode string in it's Punycode equivalent. The Punycode is defined in RFC 3492. But it is much easier with a library. As for almost anything there is something prefabricated. There is the GNU libidn library which will do this task for us. I wrote a simple wrapper to use libidn with PHP.

Go to PHP IDNA Project Page

The PHP extension adds two new functions:

<?php

echo idna_toASCII('xärg.örg'), "\n";

echo idna_toUnicode('xn--xrg-9ka.xn--rg-eka');

?>

The output looks like this:

xn--xrg-9ka.xn--rg-eka
xärg.örg

The idea and the syntax is mainly based on the Java Version.

You might also be interested in the following

 

Sorry, comments are closed for this article. Contact me if you have an inventive contribution.