Handling UTF-8 with BOM in HTTP

Posted by Alois Mahdal on Server Fault See other posts from Server Fault or by Alois Mahdal
Published on 2012-04-15T09:16:53Z Indexed on 2012/04/15 11:33 UTC
Read the original article Hit count: 369

Filed under:

http

|

utf-8

|

unicode

Say I have a script which at some point serves a plain text file as a content (right after "\n\n"). These files are provided by users, but I can expect they will be UTF-8. So I hard-wire Content-Type: text/plain; charset=UTF-8.

But while I can teach users to save everything in UTF-8, I can't be very sure that the files will be without BOM ("\xEE\xBB\xBF"), as at least on Windows, this is not very clearly distinguished in common plain text editors and not every one of them uses the same default.

So what about these files created on Windows, where they may/may not start with BOM? Should/will server or UA get rid of this debris for me? Or is it my task to prepare clean UTF-8, i.e. open each file and check whether BOM needs to be removed?

© Server Fault or respective owner

Related posts about http

Cannot update, apt-get cannot fetch index files

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I have a fresh install of Ubuntu 11.10 from the iso 'ubuntu-11.10-desktop-amd64.iso'. I installed this in VMWare Fusion 4.1.1 running on OSX 10.7.3. When setting up the VM, I allowed easy install to take care of creating my user and installing VMWare tools. No problems during installation, everything… >>> More
the size of apt-get update lists is too big

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I ran a clean install to Ubuntu 12.04 and so far everything has been working well. I especially commend the Ubuntu team for this release. I only noticed that the size of repository update is now about ~13MB. Normally, it is about this size for the first time you run apt-get update after a clean install… >>> More
Quantal: Broken apt-index, cant fix dependencies

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I can't seem to add/remove/update packages Ubuntu software update has a notice about partial upgrades but fails Seems to be similar to this problem $ sudo apt-get update Ign http://archive.ubuntu.com quantal InRelease Ign http://security.ubuntu.com precise-security InRelease Ign… >>> More
12.04: Apt-Get Update: failure to fetch; can't connect to any sources

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I realize there are dozens of "apt-get update: failure to fetch" questions (I read through all I could find), but my present circumstance is unique to 12.04 and it affects all sources; not just launchpad. Additionally, I've tried several different servers in Europe and the U.S. as well as the "main… >>> More
How can I remove the Translation entries in apt?

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
This is the output of aptitude update: Ign http://archive.canonical.com natty InRelease Ign http://extras.ubuntu.com natty InRelease Ign http://dl.google.com stable InRelease Ign http://security.ubuntu.com natty-security InRelease Hit http://deb.torproject.org natty InRelease Get:1 http://dl.google… >>> More

Related posts about utf-8

Why can't I change the AU_AU locale to en_US?

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
/bin/bash: warning: setlocale: LC_ALL: cannot change locale ( (unset)) Generating locales... en_US.ISO-8859-1... /usr/sbin/locale-gen: line 177: warning: setlocale: LC_ALL: cannot change locale ( (unset)) done Generation complete. ganesha@ubuntu:~$ sudo update_locale LANG=en_US sudo: update_locale:… >>> More
Confused about C++'s std::wstring, UTF-16, UTF-8 and displaying strings in a windows GUI

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm working on a english only C++ program for Windows where we were told "always use std::wstring", but it seems like nobody on the team really has much of an understanding beyond that. I already read the question titled "std::wstring VS std::string. It was very helpful, but I still don't quite… >>> More
Reading a plist utf-8 value as utf-16

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm working on an iphone app that needs to display superscripts and subscripts. I'm using a picker to read in data from a plist but the unicode values aren't being displayed corretly in the pickerview. Subscripts and superscripts are not being recognized. I'm assuming this is due to the encoding… >>> More
Forcing a mixed ISO-8859-1 and UTF-8 multi-line string into UTF-8

as seen on Stack Overflow - Search for 'Stack Overflow'
Consider the following problem: A multi-line string $junk contains some lines which are encoded in UTF-8 and some in ISO-8859-1. I don't know a priori which lines are in which encoding, so heuristics will be needed. I want to turn $junk into pure UTF-8 with proper re-encoding of the ISO-8859-1… >>> More
How can I tell if a CSV is in UTF-7 or UTF-8

as seen on Stack Overflow - Search for 'Stack Overflow'
Excel seems to save CSV files in (what I think is) UTF-7, despite the fact that most information I have read suggest that in general, you should not UTF-7. Indeed, other applications (Text pad, which lets me choose) save things in UTF-8 (or Unicode etc, but UTF-7 is not even an option). Using .NET… >>> More