Edit: Summary
Apparently the intended character to display in this case is an "en-dash".
This page has a table half way down that shows that for the –, some software will convert the correct hex code of 2013 to 0096. (look at the first row in the table).
This answer on Stackoverflow explains that somehow this is a mixup between Windows-1252 and UTF-8
This blog article enforces this:
Character 150 (0x96) is the unicode
character "START OF GUARDED AREA" in
the non-displayed C1 control character
range, but in the Windows-1252
encoding it's mapped to to the
displayable character 0x2013 "en-dash"
(a short dash).
Others have struggled with this when producing content, as this answer on Stackoverflow shows how to replace 0x0096 with 0x2013.
Google must realize this, because as stated in my original question below, Google's cached version of the Amazon page has – so it seems they are automatically correcting these mistakes on pages they cache.
I have tried setting my encoding to Windows-1252 but that does not help.
So now I guess my question is, how can I tell Firefox to ignore unprintable characters like these?
Original content below:
(Firefox 3.6.13 on Windows XP)
Every once in a while I notice an odd character on certain web pages when browsing the web. It is a outline of a box with a 4-digit number inside.
And example of a page that has these characters is:
http://aws.amazon.com/ec2/#highlights
After each section heading (Elastic, Completely Controlled, ...) I see a box with the number "0096" inside. I looked at the cached version on Google, and google has – in it's place, so I'm guessing I should be seeing a dash there instead of the box with the numbers in it.
I have tried changing the character encoding in Firefox but haven't been able to find one that shows these characters correctly.
Is there a way to allow Firefox to view these characters?
Thanks in advance!
Edit - adding a screen shot of the "special" characters:
Edit #2 - tried in Ubuntu - new screenshots
I logged into my Ubuntu desktop and browsed to the amazon page in Chrome and Firefox. Chrome completely ignores character, even if I inspect or view page source. Firefox in Unbutu displays the character exactly like Firefox on my Windows XP box. I copied the character and played around with it at the command line - here is a screenshot of the results:
It looks like I can paste the character into this post as well: ``
It is definitely not isolated to Windows XP. I tried setting the character encoding for my terminal to Windows 1252 (from Dennis' comment below), but then it just displays this character as a question mark.
I pulled the webpage down with wget and with curl, and both outputs show this characters as: <96>
It makes me wonder if this character renders correctly for anyone? It appears webkit just ignores it, my IE6 ignores it, Firefox displays the box with the numbers in it. I would have to imagine the design team at Amazon can see it correctly?
It's not a huge deal to get these characters displaying correctly, but it would be nice to know if there is a solution to this.