How to handle URLs with diacritic characters
- by user359650
I am wondering how to handle URLs which correspond to strings containing diacritic (á, u, ´...). I believe what we're seeing mostly are URLs where diacritic characters where converted to their closest ASCII equivalent, for instance Rånades på Skyttis i Ö-vik converted to ranades-pa-skyttis-i-o-vik.
However depending on the corresponding language, such conversion might be incorrect. For instance in German, ü should be converted to ue and not just u, as seen with the below URL representing the Bayern München string as bayern-muenchen:
http://www.bundesliga.de/en/liga/clubs/fc-bayern-muenchen/index.php
However what I've also noticed, is that browsers can render non-ASCII characters when they are percent-encoded in the URL, which is the approach Wikipedia has chosen, for instance http://de.wikipedia.org/wiki/FC_Bayern_M%C3%BCnchen which is rendered as:
Therefore I'm considering the following approach for creating URL slugs:
-(1) convert strings while replacing non-ASCII characters to their recommended ASCII representation: Bayern München - bayern-muenchen
-(2) also convert strings to percent encoding: Bayern München - bayern_m%C3%BCnchen
-create a 301 redirect from version (1) to version (2)
Version (1) URLs could be used for marketing purposes (e.g. mywebsite.com/bayern-muenchen) but the URLs that would end being displayed in the browser bar would be version (2) URLs (e.g. mywebsite.com/bayern-münchen).
Can you foresee particular problems with this approach? (Wikipedia is not doing it and I wonder why, apart from the fact that they don't need to market their URLs)