When the Domain Name System (DNS) was designed, nobody thought of it that eventually other characters than the default [a-z0-9-] will be needed. In 2003, the first top-level domains (TLD) - .info, .jp, .kr, .lt, .pl, .se, ... - started to expand their character set. By now you can register IDN domains with all large TLDs (cnobi / dcnobi) and many Country-code TLDs.
But what is IDN? An internationalized domain name (IDN) is a domain name contains non-ASCII characters. The implementation must be done on the client side. The Server gets standard ASCII strings for the domain name. To encode the domains a simple algorithm called Punycode is used. The string "löl" will become: "xn--ll-fka". Characteristic is that the string always begin with "xn--".
Of course you can build a simple script which converts a Unicode string in it's Punycode equivalent. The Punycode is defined in RFC 3492. But it is much easier with a library. As for almost anything there is something prefabricated. There is the GNU libidn library which will do this task for us. I wrote a simple wrapper to use libidn with PHP.
The PHP extension adds two new functions:
<?php echo idna_toASCII('xärg.örg'), "\n"; echo idna_toUnicode('xn--xrg-9ka.xn--rg-eka'); ?>
The output looks like this:
The idea and the syntax is mainly based on the Java Version.
You might also be interested in the following
- Simple System monitoring with PHP
- How I've got no new .de domain
- Face detection with PHP
- jQuery Pagination revised
Sorry, comments are closed for this article. Contact me if you want to leave a note.