UTF-8 HTML and CSS files with BOM (and how to remove the BOM with Python)

Posted by Cameron on Stack Overflow See other posts from Stack Overflow or by Cameron
Published on 2010-03-16T16:53:14Z Indexed on 2010/03/16 23:31 UTC
Read the original article Hit count: 358

Filed under:

python

|

utf-8

|

bom

|

file

|

web-development

First, some background: I'm developing a web application using Python. All of my (text) files are currently stored in UTF-8 with the BOM. This includes all my HTML templates and CSS files. These resources are stored as binary data (BOM and all) in my DB.

When I retrieve the templates from the DB, I decode them using template.decode('utf-8'). When the HTML arrives in the browser, the BOM is present at the beginning of the HTTP response body. This generates a very interesting error in Chrome:

Extra <html> encountered. Migrating attributes back to the original <html> element and ignoring the tag.

Chrome seems to generate an <html> tag automatically when it sees the BOM and mistakes it for content, making the real <html> tag an error.

So, using Python, what is the best way to remove the BOM from my UTF-8 encoded templates (if it exists -- I can't guarantee this in the future)?

For other text-based files like CSS, will major browsers correctly interpret (or ignore) the BOM? They are being sent as plain binary data without .decode('utf-8').

Note: I am using Python 2.5.

Thanks!

© Stack Overflow or respective owner

Related posts about python

unmet dependencies in Ubuntu 12.04

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I tried today to install a dvb-card on my Ubuntu 12.04 (Linux blauhai-linux 3.2.0-25-generic #40-Ubuntu SMP Wed May 23 20:30:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux ). The installation failed with an error. After that, i tried to install python (it was already installed but i got this error): linux:~$… >>> More
How can I get sikuli-ide to work?

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I installed sikuli-ide with sudo apt-get install sikuli-ide Everything was fine until I tried to start it from the terminal. I typed sikuli-ide But the only response I got was [info] locale: en_US The application was not started, furthermore there is no desktop file and sikuli-ide does not… >>> More
Getting PATH right for python after MacPorts install

as seen on Super User - Search for 'Super User'
I can't import some python libraries (PIL, psycopg2) that I just installed with MacPorts. I looked through these forums, and tried to adjust my PATH variable in $HOME/.bash_profile in order to fix this but it did not work. I added the location of PIL and psycopg2 to PATH. I know that Terminal is… >>> More
call python with system() in R to run a python script emulating the python console

as seen on Stack Overflow - Search for 'Stack Overflow'
I want to pass a chunk of Python code to Python in R with something like system('python ...'), and I'm wondering if there is an easy way to emulate the python console in this case. For example, suppose the code is "print 'hello world'", how can I get the output like this in R? >>> print… >>> More
Python - Calling a non python program from python?

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I am currently struggling to call a non python program from a python script. I have a ~1000 files that when passed through this C++ program will generate ~1000 outputs. Each output file must have a distinct name. The command I wish to run is of the form: program_name -input -output -o1 -o2… >>> More

Related posts about utf-8

Why can't I change the AU_AU locale to en_US?

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
/bin/bash: warning: setlocale: LC_ALL: cannot change locale ( (unset)) Generating locales... en_US.ISO-8859-1... /usr/sbin/locale-gen: line 177: warning: setlocale: LC_ALL: cannot change locale ( (unset)) done Generation complete. ganesha@ubuntu:~$ sudo update_locale LANG=en_US sudo: update_locale:… >>> More
Confused about C++'s std::wstring, UTF-16, UTF-8 and displaying strings in a windows GUI

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm working on a english only C++ program for Windows where we were told "always use std::wstring", but it seems like nobody on the team really has much of an understanding beyond that. I already read the question titled "std::wstring VS std::string. It was very helpful, but I still don't quite… >>> More
Reading a plist utf-8 value as utf-16

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm working on an iphone app that needs to display superscripts and subscripts. I'm using a picker to read in data from a plist but the unicode values aren't being displayed corretly in the pickerview. Subscripts and superscripts are not being recognized. I'm assuming this is due to the encoding… >>> More
Forcing a mixed ISO-8859-1 and UTF-8 multi-line string into UTF-8

as seen on Stack Overflow - Search for 'Stack Overflow'
Consider the following problem: A multi-line string $junk contains some lines which are encoded in UTF-8 and some in ISO-8859-1. I don't know a priori which lines are in which encoding, so heuristics will be needed. I want to turn $junk into pure UTF-8 with proper re-encoding of the ISO-8859-1… >>> More
How can I tell if a CSV is in UTF-7 or UTF-8

as seen on Stack Overflow - Search for 'Stack Overflow'
Excel seems to save CSV files in (what I think is) UTF-7, despite the fact that most information I have read suggest that in general, you should not UTF-7. Indeed, other applications (Text pad, which lets me choose) save things in UTF-8 (or Unicode etc, but UTF-7 is not even an option). Using .NET… >>> More