making bbPress (and WordPress) work better!

Posts tagged “unicode

How to fix: Can’t see Unicode (UTF8) in Notepad++ on Windows XP

This is a little late to help most people because they have moved on from Windows XP to newer flavors, however there are still some die-hards going to 2019 with the simple PosReady registry tweak.

If you have a full unicode font installed like Symbola on Windows XP, you may still not see proper characters in applications like Notepad++ and instead get double empty boxes in their place.
(more…)


Punycode to Unicode Converter in simple PHP

Today I needed a way to simply convert Punycode internationalized domain names to Unicode for proper display in UTF-8. I was hoping for some easy iconv magic but no such luck, PHP can’t even do part of it directly.

Googling for a bit I was only able to find one existing class that did this in pure PHP but it was well over 100k in size which was disturbing for my simple needs.

So I whittled it down to 50 lines or so and made some tweaks:

http://pastebin.com/raw.php?i=M2GzkvFf&punycode_to_unicode.php (download)

It can now handle in one function “multi-part” domains that have punycode in the sub-domain, domain and/or TLD.

ie. all the examples here work:
http://idn.icann.org/#The_example.test_names

so xn--r8jz45g.xn--zckzah is properly converted to 例え.テスト

it also works with mixed domains, ie. xn--54b7fta0cc.idn.icann.org

(you can only pass it the host part of the url, do not pass it the full URL with http or slashes or it will fail – use PHP’s parse_url to get just the host)

Note this does not do any sanitizing or other thorough checks or fixes – if you need that functionality (ie. raw user input from unknown sources) you’ll probably need the original full class over here:

http://phlymail.com/en/downloads/idna/download/