Strange Behaviour with Unicode Characters in Windows
- by open_sourse
Ok, I do not know if this is a programming question, but it certainly is a technical one so I am asking it here. I was working on some internationalization stuff in my PHP code, and in order to ensure that my generated HTML shows up Unicode correctly based on the encoding and stuff I decided to add some Chinese text to my PHP page, which then echoes it into the browser to complete my test case.
So I went into google and typed "Chinese", copied the first Chinese text that the search returned (which was ??/??). I then copied it into Notepad++ which is my editor, and to my surprise showed up as boxes similar to [][]/[][]. So I thought the encoding in Notepad++ was messed up and I changed the encoding to UTF-8 and UCS, neither worked. I did it fresh in a newly encoded file, still I got the boxes. The same content when I paste into Google and StackOverFlow (like I did in this posting) shows up correct Chinese!
I even opened up Windows Clipboard Viewer and the content is represented in the Clipboard as boxes! I tried pasting it into Windows Explorer address bar and using to rename a file to, but I still get boxes. But it shows up correctly when pasted into my Chrome Browser address bar!
Is this a Windows issue? Since I am able to paste it correctly in SO, the data in memory should be encoded correctly right? But if that is the case why does it show up as boxes in the Clipboard Viewer?
I am confused here...By the way I am using Windows XP with SP3.
(I am asking this question here, even if it is not programmatic, because it is preventing me from running my programming test cases..)