Normalizing (webdav) unicode paths
- by Evert
Hi guys,
I'm working on a WebDAV implementation for PHP. In order to make it easier for Windows and other operating systems to work together, I need jump through some character encoding hoops.
Windows uses ISO-8859-1 in it's HTTP request, while most other clients encode anything beyond ascii as UTF-8.
My first approach was to ignore this altogether, but I quickly ran into issues when returning urls. I then figured it's probably best to normalize all urls.
Using u¨ as an example. This will get sent over the wire by OS/X as
u%CC%88 (this is codepoint U+0308)
Windows sents this as:
%FC (latin1)
But, doing a utf8_encode on %FC, I get :
%C3%BC (this is codepoint U+00FC)
Should I treat %C3%BC and u%CC%88 as the same thing? If so.. how? Not touching it seems to work OK for windows. It somehow understands that it's a unicode character, but updating the same file throws an error (for no particular reason).
I'd be happy to provide more information.