Lost in UTF-8 hell. (Django and Python)
- by user140314
I am working through the Django RSS reader project here.
The RSS feed will read something like "OKLAHOMA CITY (AP) — James Harden let". The RSS feed's encoding reads encoding="UTF-8" so I believe I am passing utf-8 to markdown in the code snippet below. The em dash is where it chokes.
I get the Django error of "'ascii' codec can't encode character u'\u2014' in position 109: ordinal not in range(128)" which is an UnicodeEncodeError. In the variables being passed I see "OKLAHOMA CITY (AP) \u2014 James Harden". The code line that is not working is:
content = content.encode(parsed_feed.encoding, "xmlcharrefreplace")
I am using markdown 2.0, django 1.1, and python 2.4.
What is the magic sequence of encoding and decoding that I need to do to make this work?
Thanks.