making bbPress (and WordPress) work better!

Punycode to Unicode Converter in simple PHP

Today I needed a way to simply convert Punycode internationalized domain names to Unicode for proper display in UTF-8. I was hoping for some easy iconv magic but no such luck, PHP can’t even do part of it directly.

Googling for a bit I was only able to find one existing class that did this in pure PHP but it was well over 100k in size which was disturbing for my simple needs.

So I whittled it down to 50 lines or so and made some tweaks:

http://pastebin.com/raw.php?i=M2GzkvFf&punycode_to_unicode.php (download)

It can now handle in one function “multi-part” domains that have punycode in the sub-domain, domain and/or TLD.

ie. all the examples here work:
http://idn.icann.org/#The_example.test_names

so xn--r8jz45g.xn--zckzah is properly converted to 例え.テスト

it also works with mixed domains, ie. xn--54b7fta0cc.idn.icann.org

(you can only pass it the host part of the url, do not pass it the full URL with http or slashes or it will fail – use PHP’s parse_url to get just the host)

Note this does not do any sanitizing or other thorough checks or fixes – if you need that functionality (ie. raw user input from unknown sources) you’ll probably need the original full class over here:

http://phlymail.com/en/downloads/idna/download/

10 responses

  1. UTF-8 error, I thank you for giving information.

    August 31, 2010 at 10:11 pm

  2. thanks infonya..

    February 19, 2011 at 5:34 pm

  3. This functionality was included in PHP 5.3, see:
    http://www.php.net/manual/en/function.idn-to-utf8.php
    http://www.php.net/manual/en/function.idn-to-ascii.php
    So this functionality is very useful for those in hosted environments who can’t upgrade to PHP 5.2

    May 2, 2011 at 1:06 pm

  4. warper

    Only wanted to thank you for this small and effective piece of code 🙂

    It saves me to install all that 100k of scripts and code.

    Regards
    warper

    May 15, 2011 at 4:54 pm

  5. Hi!
    It would be great if you write UTF8 to Punycode function 🙂
    Thanks!

    July 28, 2011 at 5:35 pm

  6. thanx for this method of conversion

    January 19, 2013 at 2:51 am

  7. I’ve loved reading each word. Any individual who believes that writing is a lost art, ought to follow
    this blog.

    September 9, 2013 at 5:28 am

  8. To quote Benjamin Franklin: being ignorant isn’t to be ashamed of.

    Being unwilling to learn is. I wish to take this opportunity and thank you for enabling me to learn.

    September 9, 2013 at 5:05 pm

  9. I adore this specific blog. It is like a secret place where I always come across something that interests me.

    September 9, 2013 at 5:07 pm

  10. dev

    big thanks!

    February 22, 2015 at 4:19 am

Leave a comment