Search Results

Search found 28 results on 2 pages for 'unicodeencodeerror'.

Page 1/2 | 1 2  | Next Page >

  • UnicodeEncodeError when uploading files in Django admin

    - by Samuel Linde
    Note: I asked this question on StackOverflow, but I realize this might be a more proper place to ask this kind of question. I'm trying to upload a file called 'Testaråäö.txt' via the Django admin app. I'm running Django 1.3.1 with Gunicorn 0.13.4 and Nginx 0.7.6.7 on a Debian 6 server. Database is PostgreSQL 8.4.9. Other Unicode data is saved to the database with no problem, so I guess the problem must be with the filesystem somehow. I've set http { charset utf-8; } in my nginx.conf. LC_ALL and LANG is set to 'sv_SE.UTF-8'. Running 'locale' verifies this. I even tried setting LC_ALL and LANG in my nginx init script just to make sure locale is set properly. Here's the traceback: Traceback (most recent call last): File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/core/handlers/base.py", line 111, in get_response response = callback(request, *callback_args, **callback_kwargs) File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/contrib/admin/options.py", line 307, in wrapper return self.admin_site.admin_view(view)(*args, **kwargs) File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/utils/decorators.py", line 93, in _wrapped_view response = view_func(request, *args, **kwargs) File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/views/decorators/cache.py", line 79, in _wrapped_view_func response = view_func(request, *args, **kwargs) File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/contrib/admin/sites.py", line 197, in inner return view(request, *args, **kwargs) File "/srv/django/letebo/app/cms/admin.py", line 81, in change_view return super(PageAdmin, self).change_view(request, obj_id) File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/utils/decorators.py", line 28, in _wrapper return bound_func(*args, **kwargs) File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/utils/decorators.py", line 93, in _wrapped_view response = view_func(request, *args, **kwargs) File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/utils/decorators.py", line 24, in bound_func return func(self, *args2, **kwargs2) File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/db/transaction.py", line 217, in inner res = func(*args, **kwargs) File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/contrib/admin/options.py", line 985, in change_view self.save_formset(request, form, formset, change=True) File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/contrib/admin/options.py", line 677, in save_formset formset.save() File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/forms/models.py", line 482, in save return self.save_existing_objects(commit) + self.save_new_objects(commit) File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/forms/models.py", line 613, in save_new_objects self.new_objects.append(self.save_new(form, commit=commit)) File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/forms/models.py", line 717, in save_new obj.save() File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/db/models/base.py", line 460, in save self.save_base(using=using, force_insert=force_insert, force_update=force_update) File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/db/models/base.py", line 504, in save_base self.save_base(cls=parent, origin=org, using=using) File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/db/models/base.py", line 543, in save_base for f in meta.local_fields if not isinstance(f, AutoField)] File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/db/models/fields/files.py", line 255, in pre_save file.save(file.name, file, save=False) File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/db/models/fields/files.py", line 92, in save self.name = self.storage.save(name, content) File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/core/files/storage.py", line 48, in save name = self.get_available_name(name) File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/core/files/storage.py", line 74, in get_available_name while self.exists(name): File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/core/files/storage.py", line 218, in exists return os.path.exists(self.path(name)) File "/srv/.virtualenvs/letebo/lib/python2.6/genericpath.py", line 18, in exists st = os.stat(path) UnicodeEncodeError: 'ascii' codec can't encode characters in position 52-54: ordinal not in range(128) I tried running Gunicorn with debugging turned on, and the file uploads without any problem at all. I suppose this must mean that the issue is with Nginx. Still beats me where to look, though. Here are the raw response headers from Gunicorn and Nginx, if it makes any sense: Gunicorn: HTTP/1.1 302 FOUND Server: gunicorn/0.13.4 Date: Thu, 09 Feb 2012 14:50:27 GMT Connection: close Transfer-Encoding: chunked Expires: Thu, 09 Feb 2012 14:50:27 GMT Vary: Cookie Last-Modified: Thu, 09 Feb 2012 14:50:27 GMT Location: http://my-server.se:8000/admin/cms/page/15/ Cache-Control: max-age=0 Content-Type: text/html; charset=utf-8 Set-Cookie: messages="yada yada yada"; Path=/ Nginx: HTTP/1.1 500 INTERNAL SERVER ERROR Server: nginx/0.7.67 Date: Thu, 09 Feb 2012 14:50:57 GMT Content-Type: text/html; charset=utf-8 Transfer-Encoding: chunked Connection: close Vary: Cookie 500 UPDATE: Both locale.getpreferredencoding() and sys.getfilesystemencoding() outputs 'UTF-8'. locale.getdefaultlocale() outputs ('sv_SE', 'UTF8'). This seem correct to me, so I'm still not sure why I keep getting these errors.

    Read the article

  • UnicodeEncodeError: 'ascii' codec can't encode character [...]

    - by user1461135
    I have read the HOWTO on Unicode from the official docs and a full, very detailed article as well. Still I don't get it why it throws me this error. Here is what I attempt: I open an XML file that contains chars out of ASCII range (but inside allowed XML range). I do that with cfg = codecs.open(filename, encoding='utf-8, mode='r') which runs fine. Looking at the string with repr() also shows me a unicode string. Now I go ahead and read that with parseString(cfg.read().encode('utf-8'). Of course, my XML file starts with this: <?xml version="1.0" encoding="utf-8"?>. Although I suppose it is not relevant, I also defined utf-8 for my python script, but since I am not writing unicode characters directly in it, this should not apply here. Same for the following line: from __future__ import unicode_literals which also is right at the beginning. Next thing I pass the generated Object to my own class where I read tags into variables like this: xmldata.getElementsByTagName(tagName)[0].firstChild.data and assign it to a variable in my class. Now what perfectly works are those commands (obj is an instance of the class): for element in obj: print element And this command does work as well: print obj.__repr__() I defined __iter__() to just yield every variable while __repr__() uses the typical printf stuff: "%s" % self.varname Both commands print perfectly and can output the unicode character. What does not work is this: print obj And now I am stuck because this throws the dreaded UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 47: So what am I missing? What am I doing wrong? I am looking for a general solution, I always want to handle strings as unicode, just to avoid any possible errors and write a compatible program. Edit: I also defined this: def __str__(self): return self.__repr__() def __unicode__(self): return self.__repr__() From documentation I got that this

    Read the article

  • gae error when i login.

    - by zjm1126
    i am using http://code.google.com/p/gaema/source/browse/#hg/demos/webapp, and this is my traceback: Traceback (most recent call last): File "D:\Program Files\Google\google_appengine\google\appengine\ext\webapp\__init__.py", line 510, in __call__ handler.get(*groups) File "D:\gaema\demos\webapp\main.py", line 31, in get google_auth.get_authenticated_user(self._on_auth) File "D:\gaema\demos\webapp\gaema\auth.py", line 641, in get_authenticated_user OpenIdMixin.get_authenticated_user(self, callback) File "D:\gaema\demos\webapp\gaema\auth.py", line 83, in get_authenticated_user url = self._OPENID_ENDPOINT + "?" + urllib.urlencode(args) File "D:\Python25\lib\urllib.py", line 1250, in urlencode v = quote_plus(str(v)) UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128) how to do this thanks ??????????????? ???????? ??????????? ?????????????? ???????

    Read the article

  • How to use unicode inside an xpath string? (UnicodeEncodeError)

    - by Gj
    I'm using xpath in Selenium RC via the Python api. I need to click an a element who's text is "Submit »" Here's the error that I'm getting: In [18]: sel.click(u"xpath=//a[text()='Submit \xbb')]") ERROR: An unexpected error occurred while tokenizing input The following traceback may be corrupted or invalid The error message is: ('EOF in multi-line statement', (1121, 0)) --------------------------------------------------------------------------- Exception Traceback (most recent call last) /Users/me/<ipython console> in <module>() /Users/me/selenium.pyc in click(self, locator) 282 'locator' is an element locator 283 """ --> 284 self.do_command("click", [locator,]) 285 286 /Users/me/selenium.pyc in do_command(self, verb, args) 213 #print "Selenium Result: " + repr(data) + "\n\n" 214 if (not data.startswith('OK')): --> 215 raise Exception, data 216 return data 217 <type 'str'>: (<type 'exceptions.UnicodeEncodeError'>, UnicodeEncodeError('ascii', u"ERROR: Invalid xpath [2]: //a[text()='Submit \xbb')]", 45, 46, 'ordinal not in range(128)'))

    Read the article

  • SQLite, python, unicode, and non-utf data

    - by Nathan Spears
    I started by trying to store strings in sqlite using python, and got the message: sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings. Ok, I switched to Unicode strings. Then I started getting the message: sqlite3.OperationalError: Could not decode to UTF-8 column 'tag_artist' with text 'Sigur Rós' when trying to retrieve data from the db. More research and I started encoding it in utf8, but then 'Sigur Rós' starts looking like 'Sigur Rós' note: My console was set to display in 'latin_1' as @John Machin pointed out. What gives? After reading this, describing exactly the same situation I'm in, it seems as if the advice is to ignore the other advice and use 8-bit bytestrings after all. I didn't know much about unicode and utf before I started this process. I've learned quite a bit in the last couple hours, but I'm still ignorant of whether there is a way to correctly convert 'ó' from latin-1 to utf-8 and not mangle it. If there isn't, why would sqlite 'highly recommend' I switch my application to unicode strings? I'm going to update this question with a summary and some example code of everything I've learned in the last 24 hours so that someone in my shoes can have an easy(er) guide. If the information I post is wrong or misleading in any way please tell me and I'll update, or one of you senior guys can update. Summary of answers Let me first state the goal as I understand it. The goal in processing various encodings, if you are trying to convert between them, is to understand what your source encoding is, then convert it to unicode using that source encoding, then convert it to your desired encoding. Unicode is a base and encodings are mappings of subsets of that base. utf_8 has room for every character in unicode, but because they aren't in the same place as, for instance, latin_1, a string encoded in utf_8 and sent to a latin_1 console will not look the way you expect. In python the process of getting to unicode and into another encoding looks like: str.decode('source_encoding').encode('desired_encoding') or if the str is already in unicode str.encode('desired_encoding') For sqlite I didn't actually want to encode it again, I wanted to decode it and leave it in unicode format. Here are four things you might need to be aware of as you try to work with unicode and encodings in python. The encoding of the string you want to work with, and the encoding you want to get it to. The system encoding. The console encoding. The encoding of the source file Elaboration: (1) When you read a string from a source, it must have some encoding, like latin_1 or utf_8. In my case, I'm getting strings from filenames, so unfortunately, I could be getting any kind of encoding. Windows XP uses UCS-2 (a Unicode system) as its native string type, which seems like cheating to me. Fortunately for me, the characters in most filenames are not going to be made up of more than one source encoding type, and I think all of mine were either completely latin_1, completely utf_8, or just plain ascii (which is a subset of both of those). So I just read them and decoded them as if they were still in latin_1 or utf_8. It's possible, though, that you could have latin_1 and utf_8 and whatever other characters mixed together in a filename on Windows. Sometimes those characters can show up as boxes, other times they just look mangled, and other times they look correct (accented characters and whatnot). Moving on. (2) Python has a default system encoding that gets set when python starts and can't be changed during runtime. See here for details. Dirty summary ... well here's the file I added: \# sitecustomize.py \# this file can be anywhere in your Python path, \# but it usually goes in ${pythondir}/lib/site-packages/ import sys sys.setdefaultencoding('utf_8') This system encoding is the one that gets used when you use the unicode("str") function without any other encoding parameters. To say that another way, python tries to decode "str" to unicode based on the default system encoding. (3) If you're using IDLE or the command-line python, I think that your console will display according to the default system encoding. I am using pydev with eclipse for some reason, so I had to go into my project settings, edit the launch configuration properties of my test script, go to the Common tab, and change the console from latin-1 to utf-8 so that I could visually confirm what I was doing was working. (4) If you want to have some test strings, eg test_str = "ó" in your source code, then you will have to tell python what kind of encoding you are using in that file. (FYI: when I mistyped an encoding I had to ctrl-Z because my file became unreadable.) This is easily accomplished by putting a line like so at the top of your source code file: # -*- coding: utf_8 -*- If you don't have this information, python attempts to parse your code as ascii by default, and so: SyntaxError: Non-ASCII character '\xf3' in file _redacted_ on line 81, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details Once your program is working correctly, or, if you aren't using python's console or any other console to look at output, then you will probably really only care about #1 on the list. System default and console encoding are not that important unless you need to look at output and/or you are using the builtin unicode() function (without any encoding parameters) instead of the string.decode() function. I wrote a demo function I will paste into the bottom of this gigantic mess that I hope correctly demonstrates the items in my list. Here is some of the output when I run the character 'ó' through the demo function, showing how various methods react to the character as input. My system encoding and console output are both set to utf_8 for this run: '?' = original char <type 'str'> repr(char)='\xf3' '?' = unicode(char) ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data 'ó' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Now I will change the system and console encoding to latin_1, and I get this output for the same input: 'ó' = original char <type 'str'> repr(char)='\xf3' 'ó' = unicode(char) <type 'unicode'> repr(unicode(char))=u'\xf3' 'ó' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Notice that the 'original' character displays correctly and the builtin unicode() function works now. Now I change my console output back to utf_8. '?' = original char <type 'str'> repr(char)='\xf3' '?' = unicode(char) <type 'unicode'> repr(unicode(char))=u'\xf3' '?' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Here everything still works the same as last time but the console can't display the output correctly. Etc. The function below also displays more information that this and hopefully would help someone figure out where the gap in their understanding is. I know all this information is in other places and more thoroughly dealt with there, but I hope that this would be a good kickoff point for someone trying to get coding with python and/or sqlite. Ideas are great but sometimes source code can save you a day or two of trying to figure out what functions do what. Disclaimers: I'm no encoding expert, I put this together to help my own understanding. I kept building on it when I should have probably started passing functions as arguments to avoid so much redundant code, so if I can I'll make it more concise. Also, utf_8 and latin_1 are by no means the only encoding schemes, they are just the two I was playing around with because I think they handle everything I need. Add your own encoding schemes to the demo function and test your own input. One more thing: there are apparently crazy application developers making life difficult in Windows. #!/usr/bin/env python # -*- coding: utf_8 -*- import os import sys def encodingDemo(str): validStrings = () try: print "str =",str,"{0} repr(str) = {1}".format(type(str), repr(str)) validStrings += ((str,""),) except UnicodeEncodeError as ude: print "Couldn't print the str itself because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print ude try: x = unicode(str) print "unicode(str) = ",x validStrings+= ((x, " decoded into unicode by the default system encoding"),) except UnicodeDecodeError as ude: print "ERROR. unicode(str) couldn't decode the string because the system encoding is set to an encoding that doesn't understand some character in the string." print "\tThe system encoding is set to {0}. See error:\n\t".format(sys.getdefaultencoding()), print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the unicode(str) because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print uee try: x = str.decode('latin_1') print "str.decode('latin_1') =",x validStrings+= ((x, " decoded with latin_1 into unicode"),) try: print "str.decode('latin_1').encode('utf_8') =",str.decode('latin_1').encode('utf_8') validStrings+= ((x, " decoded with latin_1 into unicode and encoded into utf_8"),) except UnicodeDecodeError as ude: print "The string was decoded into unicode using the latin_1 encoding, but couldn't be encoded into utf_8. See error:\n\t", print ude except UnicodeDecodeError as ude: print "Something didn't work, probably because the string wasn't latin_1 encoded. See error:\n\t", print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the str.decode('latin_1') because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print uee try: x = str.decode('utf_8') print "str.decode('utf_8') =",x validStrings+= ((x, " decoded with utf_8 into unicode"),) try: print "str.decode('utf_8').encode('latin_1') =",str.decode('utf_8').encode('latin_1') except UnicodeDecodeError as ude: print "str.decode('utf_8').encode('latin_1') didn't work. The string was decoded into unicode using the utf_8 encoding, but couldn't be encoded into latin_1. See error:\n\t", validStrings+= ((x, " decoded with utf_8 into unicode and encoded into latin_1"),) print ude except UnicodeDecodeError as ude: print "str.decode('utf_8') didn't work, probably because the string wasn't utf_8 encoded. See error:\n\t", print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the str.decode('utf_8') because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t",uee print print "Printing information about each character in the original string." for char in str: try: print "\t'" + char + "' = original char {0} repr(char)={1}".format(type(char), repr(char)) except UnicodeDecodeError as ude: print "\t'?' = original char {0} repr(char)={1} ERROR PRINTING: {2}".format(type(char), repr(char), ude) except UnicodeEncodeError as uee: print "\t'?' = original char {0} repr(char)={1} ERROR PRINTING: {2}".format(type(char), repr(char), uee) print uee try: x = unicode(char) print "\t'" + x + "' = unicode(char) {1} repr(unicode(char))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = unicode(char) ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = unicode(char) {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) try: x = char.decode('latin_1') print "\t'" + x + "' = char.decode('latin_1') {1} repr(char.decode('latin_1'))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = char.decode('latin_1') ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = char.decode('latin_1') {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) try: x = char.decode('utf_8') print "\t'" + x + "' = char.decode('utf_8') {1} repr(char.decode('utf_8'))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = char.decode('utf_8') ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = char.decode('utf_8') {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) print x = 'ó' encodingDemo(x) Much thanks for the answers below and especially to @John Machin for answering so thoroughly.

    Read the article

  • SQLite, python, unicode, and non-utf data

    - by Nathan Spears
    I started by trying to store strings in sqlite using python, and got the message: sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings. Ok, I switched to Unicode strings. Then I started getting the message: sqlite3.OperationalError: Could not decode to UTF-8 column 'tag_artist' with text 'Sigur Rós' when trying to retrieve data from the db. More research and I started encoding it in utf8, but then 'Sigur Rós' starts looking like 'Sigur Rós' note: My console was set to display in 'latin_1' as @John Machin pointed out. What gives? After reading this, describing exactly the same situation I'm in, it seems as if the advice is to ignore the other advice and use 8-bit bytestrings after all. I didn't know much about unicode and utf before I started this process. I've learned quite a bit in the last couple hours, but I'm still ignorant of whether there is a way to correctly convert 'ó' from latin-1 to utf-8 and not mangle it. If there isn't, why would sqlite 'highly recommend' I switch my application to unicode strings? I'm going to update this question with a summary and some example code of everything I've learned in the last 24 hours so that someone in my shoes can have an easy(er) guide. If the information I post is wrong or misleading in any way please tell me and I'll update, or one of you senior guys can update. Summary of answers Let me first state the goal as I understand it. The goal in processing various encodings, if you are trying to convert between them, is to understand what your source encoding is, then convert it to unicode using that source encoding, then convert it to your desired encoding. Unicode is a base and encodings are mappings of subsets of that base. utf_8 has room for every character in unicode, but because they aren't in the same place as, for instance, latin_1, a string encoded in utf_8 and sent to a latin_1 console will not look the way you expect. In python the process of getting to unicode and into another encoding looks like: str.decode('source_encoding').encode('desired_encoding') or if the str is already in unicode str.encode('desired_encoding') For sqlite I didn't actually want to encode it again, I wanted to decode it and leave it in unicode format. Here are four things you might need to be aware of as you try to work with unicode and encodings in python. The encoding of the string you want to work with, and the encoding you want to get it to. The system encoding. The console encoding. The encoding of the source file Elaboration: (1) When you read a string from a source, it must have some encoding, like latin_1 or utf_8. In my case, I'm getting strings from filenames, so unfortunately, I could be getting any kind of encoding. Windows XP uses UCS-2 (a Unicode system) as its native string type, which seems like cheating to me. Fortunately for me, the characters in most filenames are not going to be made up of more than one source encoding type, and I think all of mine were either completely latin_1, completely utf_8, or just plain ascii (which is a subset of both of those). So I just read them and decoded them as if they were still in latin_1 or utf_8. It's possible, though, that you could have latin_1 and utf_8 and whatever other characters mixed together in a filename on Windows. Sometimes those characters can show up as boxes, other times they just look mangled, and other times they look correct (accented characters and whatnot). Moving on. (2) Python has a default system encoding that gets set when python starts and can't be changed during runtime. See here for details. Dirty summary ... well here's the file I added: \# sitecustomize.py \# this file can be anywhere in your Python path, \# but it usually goes in ${pythondir}/lib/site-packages/ import sys sys.setdefaultencoding('utf_8') This system encoding is the one that gets used when you use the unicode("str") function without any other encoding parameters. To say that another way, python tries to decode "str" to unicode based on the default system encoding. (3) If you're using IDLE or the command-line python, I think that your console will display according to the default system encoding. I am using pydev with eclipse for some reason, so I had to go into my project settings, edit the launch configuration properties of my test script, go to the Common tab, and change the console from latin-1 to utf-8 so that I could visually confirm what I was doing was working. (4) If you want to have some test strings, eg test_str = "ó" in your source code, then you will have to tell python what kind of encoding you are using in that file. (FYI: when I mistyped an encoding I had to ctrl-Z because my file became unreadable.) This is easily accomplished by putting a line like so at the top of your source code file: # -*- coding: utf_8 -*- If you don't have this information, python attempts to parse your code as ascii by default, and so: SyntaxError: Non-ASCII character '\xf3' in file _redacted_ on line 81, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details Once your program is working correctly, or, if you aren't using python's console or any other console to look at output, then you will probably really only care about #1 on the list. System default and console encoding are not that important unless you need to look at output and/or you are using the builtin unicode() function (without any encoding parameters) instead of the string.decode() function. I wrote a demo function I will paste into the bottom of this gigantic mess that I hope correctly demonstrates the items in my list. Here is some of the output when I run the character 'ó' through the demo function, showing how various methods react to the character as input. My system encoding and console output are both set to utf_8 for this run: '?' = original char <type 'str'> repr(char)='\xf3' '?' = unicode(char) ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data 'ó' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Now I will change the system and console encoding to latin_1, and I get this output for the same input: 'ó' = original char <type 'str'> repr(char)='\xf3' 'ó' = unicode(char) <type 'unicode'> repr(unicode(char))=u'\xf3' 'ó' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Notice that the 'original' character displays correctly and the builtin unicode() function works now. Now I change my console output back to utf_8. '?' = original char <type 'str'> repr(char)='\xf3' '?' = unicode(char) <type 'unicode'> repr(unicode(char))=u'\xf3' '?' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Here everything still works the same as last time but the console can't display the output correctly. Etc. The function below also displays more information that this and hopefully would help someone figure out where the gap in their understanding is. I know all this information is in other places and more thoroughly dealt with there, but I hope that this would be a good kickoff point for someone trying to get coding with python and/or sqlite. Ideas are great but sometimes source code can save you a day or two of trying to figure out what functions do what. Disclaimers: I'm no encoding expert, I put this together to help my own understanding. I kept building on it when I should have probably started passing functions as arguments to avoid so much redundant code, so if I can I'll make it more concise. Also, utf_8 and latin_1 are by no means the only encoding schemes, they are just the two I was playing around with because I think they handle everything I need. Add your own encoding schemes to the demo function and test your own input. One more thing: there are apparently crazy application developers making life difficult in Windows. #!/usr/bin/env python # -*- coding: utf_8 -*- import os import sys def encodingDemo(str): validStrings = () try: print "str =",str,"{0} repr(str) = {1}".format(type(str), repr(str)) validStrings += ((str,""),) except UnicodeEncodeError as ude: print "Couldn't print the str itself because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print ude try: x = unicode(str) print "unicode(str) = ",x validStrings+= ((x, " decoded into unicode by the default system encoding"),) except UnicodeDecodeError as ude: print "ERROR. unicode(str) couldn't decode the string because the system encoding is set to an encoding that doesn't understand some character in the string." print "\tThe system encoding is set to {0}. See error:\n\t".format(sys.getdefaultencoding()), print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the unicode(str) because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print uee try: x = str.decode('latin_1') print "str.decode('latin_1') =",x validStrings+= ((x, " decoded with latin_1 into unicode"),) try: print "str.decode('latin_1').encode('utf_8') =",str.decode('latin_1').encode('utf_8') validStrings+= ((x, " decoded with latin_1 into unicode and encoded into utf_8"),) except UnicodeDecodeError as ude: print "The string was decoded into unicode using the latin_1 encoding, but couldn't be encoded into utf_8. See error:\n\t", print ude except UnicodeDecodeError as ude: print "Something didn't work, probably because the string wasn't latin_1 encoded. See error:\n\t", print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the str.decode('latin_1') because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print uee try: x = str.decode('utf_8') print "str.decode('utf_8') =",x validStrings+= ((x, " decoded with utf_8 into unicode"),) try: print "str.decode('utf_8').encode('latin_1') =",str.decode('utf_8').encode('latin_1') except UnicodeDecodeError as ude: print "str.decode('utf_8').encode('latin_1') didn't work. The string was decoded into unicode using the utf_8 encoding, but couldn't be encoded into latin_1. See error:\n\t", validStrings+= ((x, " decoded with utf_8 into unicode and encoded into latin_1"),) print ude except UnicodeDecodeError as ude: print "str.decode('utf_8') didn't work, probably because the string wasn't utf_8 encoded. See error:\n\t", print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the str.decode('utf_8') because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t",uee print print "Printing information about each character in the original string." for char in str: try: print "\t'" + char + "' = original char {0} repr(char)={1}".format(type(char), repr(char)) except UnicodeDecodeError as ude: print "\t'?' = original char {0} repr(char)={1} ERROR PRINTING: {2}".format(type(char), repr(char), ude) except UnicodeEncodeError as uee: print "\t'?' = original char {0} repr(char)={1} ERROR PRINTING: {2}".format(type(char), repr(char), uee) print uee try: x = unicode(char) print "\t'" + x + "' = unicode(char) {1} repr(unicode(char))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = unicode(char) ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = unicode(char) {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) try: x = char.decode('latin_1') print "\t'" + x + "' = char.decode('latin_1') {1} repr(char.decode('latin_1'))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = char.decode('latin_1') ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = char.decode('latin_1') {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) try: x = char.decode('utf_8') print "\t'" + x + "' = char.decode('utf_8') {1} repr(char.decode('utf_8'))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = char.decode('utf_8') ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = char.decode('utf_8') {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) print x = 'ó' encodingDemo(x) Much thanks for the answers below and especially to @John Machin for answering so thoroughly.

    Read the article

  • Using a Unicode format for Python's `time.strftime()`

    - by Hosam Aly
    I am trying to call Python's time.strftime() function using a Unicode format string: u'%d\u200f/%m\u200f/%Y %H:%M:%S' (\u200f is the "Right-To-Left Mark" (RLM).) However, I am getting an exception that the RLM character cannot be encoded into ascii: UnicodeEncodeError: 'ascii' codec can't encode character u'\u200f' in position 2: ordinal not in range(128) I have tried searching for an alternative but could not find a reasonable one. Is there an alternative to this function, or a way to make it work with Unicode characters?

    Read the article

  • Beautiful Soup Unicode encode error

    - by iamrohitbanga
    I am trying the following code with a particular HTML file from BeautifulSoup import BeautifulSoup import re import codecs import sys f = open('test1.html') html = f.read() soup = BeautifulSoup(html) body = soup.body.contents para = soup.findAll('p') print str(para).encode('utf-8') I get the following error: UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 9: ordinal not in range(128) How do I debug this?

    Read the article

  • Trouble with encoding and urllib

    - by Ockonal
    Hello, I'm loading web-page using urllib. Ther eis russian symbols, but page encoding is 'utf-8' 1 pageData = unicode(requestHandler.read()).decode('utf-8') UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 262: ordinal not in range(128) 2 pageData = requestHandler.read() soupHandler = BeautifulSoup(pageData) print soupHandler.findAll(...) UnicodeEncodeError: 'ascii' codec can't encode characters in position 340-345: ordinal not in range(128)

    Read the article

  • Python, Unicode, and the Windows console

    - by James Sulak
    When I try to print a Unicode string in a windows console, I get a "UnicodeEncodeError: 'charmap' codec can't encode character ...." error. I assume this is because the Windows console does not accept Unicode-only characters. What's the best way around this? Is there any way I can make Python automatically print a "?" instead of failing in this situation? Edit: I'm using Python 2.5.

    Read the article

  • Unicode filename to python subprocess.call()

    - by otrov
    I'm trying to run subprocess.call() with unicode filename, and here is simplified problem: n = u'c:\\windows\\notepad.exe ' f = u'c:\\temp\\nèw.txt' subprocess.call(n + f) which raises famous error: UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' Encoding to utf-8 produces wrong filename, and mbcs passes filename as new.txt without accent I just can't read any more on this confusing subject and spin in circle. I found here lot of answers for many different problems in past so I thought to join and ask for help myself Thanks

    Read the article

  • Python's string.translate() doesn't fully work?

    - by Rhubarb
    Given this example, I get the error that follows: print u'\2033'.translate({2033:u'd'}) C:\Python26\lib\encodings\cp437.pyc in encode(self, input, errors) 10 11 def encode(self,input,errors='strict'): ---> 12 return codecs.charmap_encode(input,errors,encoding_map) 13 14 def decode(self,input,errors='strict'): UnicodeEncodeError: 'charmap' codec can't encode character u'\x83' in position 0

    Read the article

  • Write data to an xml file.

    - by Bobby
    My aim is to write an XML file with few tags whose values are the regional language. I'm using Python to do this and using IDLE(Pythong GUI) for programming. While I try to write the words in an xmls file it gives the following error. UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128) For now, I'm not using any xml writer library instead I'm opening a file "test.xml" and writing the data into it. This error is encountered by the line: f.write("\t\" + data + "\"). If I replace the above write statement with print statement then it prints the data properly on Python Shell. I'm reading the data from an excel file which is not in the UTF8,16, or 32 encoding formats. Its in some other format. cp1252 is reading the data properly. Any help in getting this data writtent oan xml would be highly appreciated. Thanks in advance. Bob

    Read the article

  • Reading UDP Packets

    - by Thomas Mathiesen
    I am having some trouble dissecting a UDP packet. I am receiving the packets and storing the data and sender-address in variables 'data' and 'addr' with: data,addr = UDPSock.recvfrom(buf) This parses the data as a string, that I am now unable to turn into bytes. I know the structure of the datagram packet which is a total of 28 bytes, and that the data I am trying to get out is in bytes 17:28. I have tried doing this: mybytes = data[16:19] print struct.unpack('>I', mybytes) --> struct.error: unpack str size does not match format And this: response = (0, 0, data[16], data[17], 6) bytes = array('B', response[:-1]) print struct.unpack('>I', bytes) --> TypeError: Type not compatible with array type And this: print "\nData byte 17:", str.encode(data[17]) --> UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in position 0: ordinal not in range(128) And I am not sure what to try next. I am completely new to sockets and byte-conversions in Python, so any advice would be helpful :) Thanks, Thomas

    Read the article

  • Lost in UTF-8 hell. (Django and Python)

    - by user140314
    I am working through the Django RSS reader project here. The RSS feed will read something like "OKLAHOMA CITY (AP) — James Harden let". The RSS feed's encoding reads encoding="UTF-8" so I believe I am passing utf-8 to markdown in the code snippet below. The em dash is where it chokes. I get the Django error of "'ascii' codec can't encode character u'\u2014' in position 109: ordinal not in range(128)" which is an UnicodeEncodeError. In the variables being passed I see "OKLAHOMA CITY (AP) \u2014 James Harden". The code line that is not working is: content = content.encode(parsed_feed.encoding, "xmlcharrefreplace") I am using markdown 2.0, django 1.1, and python 2.4. What is the magic sequence of encoding and decoding that I need to do to make this work? Thanks.

    Read the article

  • python read utf8 text file problem

    - by cpps
    I have a problem with python about reading and print utf8 text file. I have a test.txt in utf8 encoding without BOM. This file has two characters in it: ?? The first character "?" is Chinese and the second "?" is Japanese. Now, When I use Ulipad (a python editor) to run the following code to read the txt file, and print these two characters. import codecs infile = "C:\\test.txt" f = codecs.open(infile, "r", "utf-8") s = f.read() print(s) I got this error, "UnicodeEncodeError: 'cp950' codec can't encode character '\u58f0' in position 1: illegal multibyte sequence" I found it caused from the second character "?" . But when I use the same code to test in python default GUI IDLE, it works to print the two characters with no error. So, how can I fix the problem. My running environment is python 3.1 , windows xp traditional Chinese.

    Read the article

  • python unichr problem

    - by jacob
    I've got some problem with unichr() on my server. Please see below: On my server (Ubuntu 9.04): >>> print unichr(255) Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in position 0: ordinal not in range(128) On my desktop (Ubuntu 9.10): >>> print unichr(255) ÿ I'm fairly new to python so I don't know how to solve this. Anyone care to help? Thanks.

    Read the article

  • Encoding gives "'ascii' codec can't encode character … ordinal not in range(128)"

    - by user140314
    I am working through the Django RSS reader project here. The RSS feed will read something like "OKLAHOMA CITY (AP) — James Harden let". The RSS feed's encoding reads encoding="UTF-8" so I believe I am passing utf-8 to markdown in the code snippet below. The em dash is where it chokes. I get the Django error of "'ascii' codec can't encode character u'\u2014' in position 109: ordinal not in range(128)" which is an UnicodeEncodeError. In the variables being passed I see "OKLAHOMA CITY (AP) \u2014 James Harden". The code line that is not working is: content = content.encode(parsed_feed.encoding, "xmlcharrefreplace") I am using markdown 2.0, django 1.1, and python 2.4. What is the magic sequence of encoding and decoding that I need to do to make this work? Thanks.

    Read the article

  • Google app engine error when I login.

    - by zjm1126
    i am using http://code.google.com/p/gaema/source/browse/#hg/demos/webapp, and this is my traceback: Traceback (most recent call last): File "D:\Program Files\Google\google_appengine\google\appengine\ext\webapp\__init__.py", line 510, in __call__ handler.get(*groups) File "D:\gaema\demos\webapp\main.py", line 31, in get google_auth.get_authenticated_user(self._on_auth) File "D:\gaema\demos\webapp\gaema\auth.py", line 641, in get_authenticated_user OpenIdMixin.get_authenticated_user(self, callback) File "D:\gaema\demos\webapp\gaema\auth.py", line 83, in get_authenticated_user url = self._OPENID_ENDPOINT + "?" + urllib.urlencode(args) File "D:\Python25\lib\urllib.py", line 1250, in urlencode v = quote_plus(str(v)) UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128) how to do this thanks updated i change the code from args = dict((k, v[-1]) for k, v in self.request.arguments.iteritems()) args["openid.mode"] = u"check_authentication" url = self._OPENID_ENDPOINT + "?" + urllib.urlencode(args) to args = dict((k, v[-1].encode('utf-8')) for k, v in self.request.arguments.iteritems()) args["openid.mode"] = u"check_authentication" url = self._OPENID_ENDPOINT + "?" + urllib.urlencode(args) but also error.

    Read the article

  • Python: why does str() on some text from a UTF-8 file give a UnicodeDecodeError?

    - by AP257
    I'm processing a UTF-8 file in Python, and have used simplejson to load it into a dictionary. However, I'm getting a UnicodeDecodeError when I try to turn one of the dictionary values into a string: f = open('my_json.json', 'r') master_dictionary = json.load(f) #some json wrangling, then it fails on this line... mysql_string += " ('" + str(v_dict['code']) Traceback (most recent call last): File "my_file.py", line 25, in <module> str(v_dict['code']) + "'), " UnicodeEncodeError: 'ascii' codec can't encode character u'\xf4' in position 35: ordinal not in range(128) Why is Python even using ASCII? I thought it used UTF-8 by default, and this is a UTF-8 file. What is the problem?

    Read the article

  • In python writing from XML to CSV, encoding error

    - by user574435
    Hi, I am trying to convert an XML file to CSV, but the encoding of the XML ("ISO-8859-1") apparently contains characters that are not in the ascii codec which Python uses to write rows. I get the error: Traceback (most recent call last): File "convert_folder_to_csv_PLAYER.py", line 139, in <module> xml2csv_PLAYER(filename) File "convert_folder_to_csv_PLAYER.py", line 121, in xml2csv_PLAYER fout.writerow(row) UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 4: ordinal not in range(128) I have tried opening the file as follows: dom1 = parse(input_filename.encode( "utf-8" ) ) and I have tried replacing the \xe1 character in each row before it is written. Any suggestions?

    Read the article

  • Python interface to PayPal - urllib.urlencode non-ASCII characters failing

    - by krys
    I am trying to implement PayPal IPN functionality. The basic protocol is as such: The client is redirected from my site to PayPal's site to complete payment. He logs into his account, authorizes payment. PayPal calls a page on my server passing in details as POST. Details include a person's name, address, and payment info etc. I need to call a URL on PayPal's site internally from my processing page passing back all the params that were passed in abovem and an additional one called 'cmd' with a value of '_notify-validate'. When I try to urllib.urlencode the params which PayPal has sent to me, I get a: While calling send_response_to_paypal. Traceback (most recent call last): File "<snip>/account/paypal/views.py", line 108, in process_paypal_ipn verify_result = send_response_to_paypal(params) File "<snip>/account/paypal/views.py", line 41, in send_response_to_paypal params = urllib.urlencode(params) File "/usr/local/lib/python2.6/urllib.py", line 1261, in urlencode v = quote_plus(str(v)) UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 9: ordinal not in range(128) I understand that urlencode does ASCII encoding, and in certain cases, a user's contact info can contain non-ASCII characters. This is understandable. My question is, how do I encode non-ASCII characters for POSTing to a URL using urllib2.urlopen(req) (or other method) Details: I read the params in PayPal's original request as follows (the GET is for testing): def read_ipn_params(request): if request.POST: params= request.POST.copy() if "ipn_auth" in request.GET: params["ipn_auth"]=request.GET["ipn_auth"] return params else: return request.GET.copy() The code I use for sending back the request to PayPal from the processing page is: def send_response_to_paypal(params): params['cmd']='_notify-validate' params = urllib.urlencode(params) req = urllib2.Request(PAYPAL_API_WEBSITE, params) req.add_header("Content-type", "application/x-www-form-urlencoded") response = urllib2.urlopen(req) status = response.read() if not status == "VERIFIED": logging.warn("PayPal cannot verify IPN responses: " + status) return False return True Obviously, the problem only arises if someone's name or address or other field used for the PayPal payment does not fall into the ASCII range.

    Read the article

1 2  | Next Page >