UTF-8 HTML and CSS files with BOM (and how to remove the BOM with Python)
Posted
by Cameron
on Stack Overflow
See other posts from Stack Overflow
or by Cameron
Published on 2010-03-16T16:53:14Z
Indexed on
2010/03/16
23:31 UTC
Read the original article
Hit count: 281
First, some background: I'm developing a web application using Python. All of my (text) files are currently stored in UTF-8 with the BOM. This includes all my HTML templates and CSS files. These resources are stored as binary data (BOM and all) in my DB.
When I retrieve the templates from the DB, I decode them using template.decode('utf-8')
. When the HTML arrives in the browser, the BOM is present at the beginning of the HTTP response body. This generates a very interesting error in Chrome:
Extra <html> encountered. Migrating attributes back to the original <html> element and ignoring the tag.
Chrome seems to generate an <html>
tag automatically when it sees the BOM and mistakes it for content, making the real <html>
tag an error.
So, using Python, what is the best way to remove the BOM from my UTF-8 encoded templates (if it exists -- I can't guarantee this in the future)?
For other text-based files like CSS, will major browsers correctly interpret (or ignore) the BOM? They are being sent as plain binary data without .decode('utf-8')
.
Note: I am using Python 2.5.
Thanks!
© Stack Overflow or respective owner