How to fix: Can’t see Unicode (UTF8) in Notepad++ on Windows XP
This is a little late to help most people because they have moved on from Windows XP to newer flavors, however there are still some die-hards going to 2019 with the simple PosReady registry tweak.
If you have a full unicode font installed like Symbola on Windows XP, you may still not see proper characters in applications like Notepad++ and instead get double empty boxes in their place.
(more…)
Punycode to Unicode Converter in simple PHP
Today I needed a way to simply convert Punycode internationalized domain names to Unicode for proper display in UTF-8. I was hoping for some easy iconv magic but no such luck, PHP can’t even do part of it directly.
Googling for a bit I was only able to find one existing class that did this in pure PHP but it was well over 100k in size which was disturbing for my simple needs.
So I whittled it down to 50 lines or so and made some tweaks:
http://pastebin.com/raw.php?i=M2GzkvFf&punycode_to_unicode.php (download)
It can now handle in one function “multi-part” domains that have punycode in the sub-domain, domain and/or TLD.
ie. all the examples here work:
http://idn.icann.org/#The_example.test_names
so xn--r8jz45g.xn--zckzah
is properly converted to 例え.テスト
it also works with mixed domains, ie. xn--54b7fta0cc.idn.icann.org
(you can only pass it the host part of the url, do not pass it the full URL with http or slashes or it will fail – use PHP’s parse_url to get just the host)
Note this does not do any sanitizing or other thorough checks or fixes – if you need that functionality (ie. raw user input from unknown sources) you’ll probably need the original full class over here: