UTF-8 HTML and CSS files with BOM (and how to remove the BOM with Python)

Posted by Cameron on Stack Overflow See other posts from Stack Overflow or by Cameron
Published on 2010-03-16T16:53:14Z Indexed on 2010/03/16 23:31 UTC
Read the original article Hit count: 281

Filed under:
|
|
|
|

First, some background: I'm developing a web application using Python. All of my (text) files are currently stored in UTF-8 with the BOM. This includes all my HTML templates and CSS files. These resources are stored as binary data (BOM and all) in my DB.

When I retrieve the templates from the DB, I decode them using template.decode('utf-8'). When the HTML arrives in the browser, the BOM is present at the beginning of the HTTP response body. This generates a very interesting error in Chrome:

Extra <html> encountered. Migrating attributes back to the original <html> element and ignoring the tag.

Chrome seems to generate an <html> tag automatically when it sees the BOM and mistakes it for content, making the real <html> tag an error.

So, using Python, what is the best way to remove the BOM from my UTF-8 encoded templates (if it exists -- I can't guarantee this in the future)?

For other text-based files like CSS, will major browsers correctly interpret (or ignore) the BOM? They are being sent as plain binary data without .decode('utf-8').

Note: I am using Python 2.5.

Thanks!

© Stack Overflow or respective owner

Related posts about python

Related posts about utf-8