Punycode with PHP

July 20th, 2008

When the Domain Name System (DNS) was designed, nobody thought of it that eventually other characters than the default [a-z0-9-] will be needed. In 2003, the first top-level domains (TLD) - .info, .jp, .kr, .lt, .pl, .se, ... - started to expand their character set. By now you can register IDN domains with all large TLDs (cnobi / dcnobi) and many Country-code TLDs.

But what is IDN? An internationalized domain name (IDN) is a domain name contains non-ASCII characters. The implementation is on the client side. The Server gets standard ASCII strings for the domain name. To encode the domains a simple algorithm called Punycode is used. The string "löl" sees coded as follows: "xn--ll-fka". Characteristical is that the string always begin with "xn--".

The one or the other might think that he simply writes a script, which converts a Unicode string in his Punycode equivalent. Who want's to do that, can do. The Punycode is defined in RFC 3492. But it is much easier with a library. As for everything there is somewhere something prefabricated. There is the GNU libidn library which do this task for us. I wrote a simple wrapper to use libidn with PHP.

Go to PHP IDNA Project Page

The PHP extension adds two new functions:

<?php

echo idna_toASCII('xàrg.örg')."\n";

echo idna_toUnicode('xn--xrg-9ka.xn--rg-eka');

?>

The output looks like this:

xn--xrg-9ka.xn--rg-eka
xàrg.örg

The idea and the syntax is mainly based on the Java Version.

 

2 Responses to "Punycode with PHP"

oyqskpsyo
November 17th, 2008
h209kt pdxmgyjsjtdg, [url=http://rcvvpziomwmn.com/]rcvvpziomwmn[/url], [link=http://xuropyhjmntk.com/]xuropyhjmntk[/link], http://wpzagrbxfqur.com/
lzrqrwxes
November 1st, 2008
yi6o0Q ldcrpxofvsag, [url=http://iihntqhtnobl.com/]iihntqhtnobl[/url], [link=http://tnmcvlomulyv.com/]tnmcvlomulyv[/link], http://ldfatraqzixe.com/

Leave a Reply