HTML encode UTF-8 string gets mangled into latin1
- by Ken Mayer
I'm parsing my nginx logs, and I want to discover some details from the HTTP_REFERER string, for example, the query string used to find the web site. One user typed in "México" which gets encoded in the log as "query=M%E9xico".
Passing this through Rack::Utils.parse_query('query=M%E9xico') you get a hash, {"query" = "M?xico"}
When you to stuff "M?exico" into Postgres (but not the more forgiving SQLite), it pukes because the string isn't proper UTF-8. Looking at http://rack.rubyforge.org/doc/Rack/Utils.html#M000324, unescape is packing a hex string.
How can I convert the string back to UTF-8, or can I get parse_query to return UTF-8 in the first place.