Unicode characters in URLs

Posted by Pekka on Stack Overflow See other posts from Stack Overflow or by Pekka
Published on 2010-04-30T07:07:54Z Indexed on 2010/04/30 7:47 UTC
Read the original article Hit count: 467

Filed under:
|
|
|

In 2010, would you serve URLs containing UTF-8 characters in a large web portal?

Unicode characters are forbidden as per the RFC on URLs (see here). They would have to be percent encoded to be standards compliant.

My main point, though, is serving the unencoded characters for the sole purpose of having nice-looking URLs, so percent encoding is out.

All major browsers seem to be parsing those URLs okay no matter what the RFC says. My general impression, though, is that it gets very shaky when leaving the domain of web browsers:

  • URLs getting copy+pasted into text files, E-Mails, even Web sites with a different encoding
  • HTTP Client libraries
  • Exotic browsers, RSS readers

Is my impression correct that trouble is to be expected here, and thus it's not a practical solution (yet) if you're serving a non-technical audience and it's important that all your links work properly even if quoted and passed on?

Is there some magic way of serving nice-looking URLs in HTML

http://www.example.com/düsseldorf?neighbourhood=Lörick

that can be copy+pasted with the special characters intact, but work correctly when re-used in older clients?

© Stack Overflow or respective owner

Related posts about url

Related posts about html