Unicode characters in URLs
- by Pekka
In 2010, would you serve URLs containing UTF-8 characters in a large web portal?
Unicode characters are forbidden as per the RFC on URLs (see here). They would have to be percent encoded to be standards compliant.
My main point, though, is serving the unencoded characters for the sole purpose of having nice-looking URLs, so percent encoding is out.
All major browsers seem to be parsing those URLs okay no matter what the RFC says. My general impression, though, is that it gets very shaky when leaving the domain of web browsers:
URLs getting copy+pasted into text files, E-Mails, even Web sites with a different encoding
HTTP Client libraries
Exotic browsers, RSS readers
Is my impression correct that trouble is to be expected here, and thus it's not a practical solution (yet) if you're serving a non-technical audience and it's important that all your links work properly even if quoted and passed on?
Is there some magic way of serving nice-looking URLs in HTML
http://www.example.com/düsseldorf?neighbourhood=Lörick
that can be copy+pasted with the special characters intact, but work correctly when re-used in older clients?