Handling UTF-8 with BOM in HTTP
Posted
by
Alois Mahdal
on Server Fault
See other posts from Server Fault
or by Alois Mahdal
Published on 2012-04-15T09:16:53Z
Indexed on
2012/04/15
11:33 UTC
Read the original article
Hit count: 300
Say I have a script which at some point serves a plain text file as a content (right after "\n\n"
). These files are provided by users, but I can expect they will be UTF-8. So I hard-wire Content-Type: text/plain; charset=UTF-8
.
But while I can teach users to save everything in UTF-8, I can't be very sure that the files will be without BOM ("\xEE\xBB\xBF"
), as at least on Windows, this is not very clearly distinguished in common plain text editors and not every one of them uses the same default.
So what about these files created on Windows, where they may/may not start with BOM? Should/will server or UA get rid of this debris for me? Or is it my task to prepare clean UTF-8, i.e. open each file and check whether BOM needs to be removed?
© Server Fault or respective owner