Handling Character Encoding in URI on Tomcat

Posted by ZZ Coder on Stack Overflow See other posts from Stack Overflow or by ZZ Coder
Published on 2009-08-05T12:55:35Z Indexed on 2010/04/17 9:33 UTC
Read the original article Hit count: 346

On the web site I am trying to help with, user can type in an URL in the browser, like following Chinese characters,

  http://localhost:8080?a=??

On server, we get

  GET /a=%E6%B5%8B%E8%AF%95 HTTP/1.1

As you can see, it's UTF-8 encoded, then URL encoded. We can handle this correctly by setting encoding to UTF-8 in Tomcat.

However, sometimes we get Latin1 encoding on certain browsers,

  http://localhost:8080?a=ß

turns into

  GET /a=%DF HTTP/1.1

Is there anyway to handle this correctly in Tomcat? Looks like the server has to do some intelligent guessing. We don't expect to handle the Latin1 correctly 100% but anything is better than what we are doing now by assuming everything is UTF-8.

The server is Tomcat 5.5. The supported browsers are IE 6+, Firefox 2+ and Safari on iPhone.

© Stack Overflow or respective owner

Related posts about tomcat

Related posts about servlets