HTML encode UTF-8 string gets mangled into latin1
Posted
by Ken Mayer
on Stack Overflow
See other posts from Stack Overflow
or by Ken Mayer
Published on 2010-04-01T02:12:52Z
Indexed on
2010/04/01
2:23 UTC
Read the original article
Hit count: 568
I'm parsing my nginx logs, and I want to discover some details from the HTTP_REFERER string, for example, the query string used to find the web site. One user typed in "México" which gets encoded in the log as "query=M%E9xico".
Passing this through Rack::Utils.parse_query('query=M%E9xico')
you get a hash, {"query" => "M?xico"}
When you to stuff "M?exico" into Postgres (but not the more forgiving SQLite), it pukes because the string isn't proper UTF-8. Looking at http://rack.rubyforge.org/doc/Rack/Utils.html#M000324, unescape is packing a hex string.
How can I convert the string back to UTF-8, or can I get parse_query to return UTF-8 in the first place.
© Stack Overflow or respective owner