Search Results

Search found 1714 results on 69 pages for 'utf8 decode'.

Page 2/69 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • How do I decode this! It's not base64

    - by steve
    Here's what I know... this "GxvS117MfVw=" when decoded turns to "56699" now what does this "+sB6hF46GyU=" turn into "?????" Parentheses not included I tried base64 decoder and it doesn't seem to be right. It is supposed to be a number. I am not sure about the length, I don't think it should exceed 5 numbers. I would really appreciate it if you can decode it for me and show me how. Thank you!!

    Read the article

  • Decode json string returned from Flickr API using PHP, curl

    - by Globalz
    Im trying to decode a json string returned from flickr within my PHP code. Im using CURL but it keeps returning a string even when I wrap json_decode() around the json sring variable. Any ideas? $api_key = '####'; $photoset_id = '###'; $query = 'http://api.flickr.com/services/rest/?&method=flickr.photosets.getPhotos&api_key='.$api_key.'&photoset_id='.$photoset_id.'&extras=url_o,url_t&format=json&jsoncallback=1'; $ch = curl_init(); // open curl session // set curl options curl_setopt($ch, CURLOPT_URL, $query); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); $data = curl_exec($ch); // execute curl session curl_close($ch); // close curl session var_dump(json_decode($data));

    Read the article

  • How to decode a JSON String

    - by ilnur777
    Hi, everybody! Could I ask you to help me to decode this JSON code: $json = '{"inbox":[{"from":"55512351","date":"29\/03\/2010","time":"21:24:10","utcOffsetSeconds":3600,"recipients":[{"address":"55512351","name":"55512351","deliveryStatus":"notRequested"}],"body":"This is message text."},{"from":"55512351","date":"29\/03\/2010","time":"21:24:12","utcOffsetSeconds":3600,"recipients":[{"address":"55512351","name":"55512351","deliveryStatus":"notRequested"}],"body":"This is message text."},{"from":"55512351","date":"29\/03\/2010","time":"21:24:13","utcOffsetSeconds":3600,"recipients":[{"address":"55512351","name":"55512351","deliveryStatus":"notRequested"}],"body":"This is message text."},{"from":"55512351","date":"29\/03\/2010","time":"21:24:13","utcOffsetSeconds":3600,"recipients":[{"address":"55512351","name":"55512351","deliveryStatus":"notRequested"}],"body":"This is message text."}]}'; I would like to organize above structure to this: Note 1: Folder: inbox From (from): ... Date (date): ... Time (time): ... utcOffsetSeconds: ... Recepient (address): ... Recepient (name): ... Status (deliveryStatus): ... Text (body): ... Note 2: ... Thank you in advance!

    Read the article

  • PHP cURL JSON Decode (X-AUTH Header)

    - by TheCyX
    <?php // Show Profile $curl = curl_init(); curl_setopt ($curl, CURLOPT_URL, "https://example/api"); curl_setopt ($curl, CURLOPT_RETURNTRANSFER, true); curl_setopt ($curl, CURLOPT_HTTPAUTH, CURLAUTH_BASIC ) ; curl_setopt ($curl, CURLOPT_HTTPHEADER, array('X-AUTH: 123456789')); $projects = curl_exec($curl); // This is empty? echo $projects; //Decode $phpArray = json_decode($projects); print_r($phpArray); foreach ($phpArray as $key => $value) { // Line 17, sure its empty, but why? echo "<p>$key | $value</p>"; } ?> Warning: Invalid argument supplied for foreach() in /html/api.php on line 17 The API needs this authentification: $ curl -i -H "X-AUTH: 123456789" https://example/api JSON File: {"id":"123456","hostId":null,"Nickname":"thecyx","DisplayName":"thecyx","AppDisplayName":"thecyx","Score":"300","Account":"Full"} The $project variable is empty. If I'm posting the API Url in the Broswer its working. And, if possible, what's the correct way to get the JSON Data e.g. [Nickname],[Score]?

    Read the article

  • OpenSL ES decode 24bit FLAC

    - by yano
    I am trying to decode a FLAC file with 24bit sample format using OpenSL ES on Android. Originally, I had my SLDataFormat_PCM for the SLDataSink setup like this. _pcm.formatType = SL_DATAFORMAT_PCM; _pcm.numChannels = 2; _pcm.samplesPerSec = SL_SAMPLINGRATE_44_1; _pcm.bitsPerSample = SL_PCMSAMPLEFORMAT_FIXED_16; _pcm.containerSize = SL_PCMSAMPLEFORMAT_FIXED_16; _pcm.channelMask = SL_SPEAKER_FRONT_LEFT | SL_SPEAKER_FRONT_RIGHT; _pcm.endianness = SL_BYTEORDER_LITTLEENDIAN; This is working well for basically any data format. Luckily the samplesPerSec is not respected (I don't want resampling). Now I want to support the full bit-depth of a FLAC file with 24bit samples. When using this format, it apparently performs a bit-depth conversion, because once I load the file, and then check the ANDROID_KEY_PCMFORMAT_BITSPERSAMPLE info, it is 16. When I put bitsPerSample = SL_PCMSAMPLEFORMAT_FIXED_24; or SL_PCMSAMPLEFORMAT_FIXED_32, then OpenSL ES rejects it E/libOpenSLES(22706): pAudioSnk: bitsPerSample=32 W/libOpenSLES(22706): Leaving Engine::CreateAudioPlayer (SL_RESULT_CONTENT_UNSUPPORTED) Any idea how this is meant to work? Is Android currently restricted to 16 bit int only? I would also accept 32bit float, but I don't suppose that will work either.

    Read the article

  • Decode base64 data as array in Python

    - by skerit
    I'm using this handy Javascript function to decode a base64 string and get an array in return. This is the string: base64_decode_array('6gAAAOsAAADsAAAACAEAAAkBAAAKAQAAJgEAACcBAAAoAQAA') This is what's returned: 234,0,0,0,235,0,0,0,236,0,0,0,8,1,0,0,9,1,0,0,10,1,0,0,38,1,0,0,39,1,0,0,40,1,0,0 The problem is I don't really understand the javascript function: var base64chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'.split(""); var base64inv = {}; for (var i = 0; i < base64chars.length; i++) { base64inv[base64chars[i]] = i; } function base64_decode_array (s) { // remove/ignore any characters not in the base64 characters list // or the pad character -- particularly newlines s = s.replace(new RegExp('[^'+base64chars.join("")+'=]', 'g'), ""); // replace any incoming padding with a zero pad (the 'A' character is zero) var p = (s.charAt(s.length-1) == '=' ? (s.charAt(s.length-2) == '=' ? 'AA' : 'A') : ""); var r = []; s = s.substr(0, s.length - p.length) + p; // increment over the length of this encrypted string, four characters at a time for (var c = 0; c < s.length; c += 4) { // each of these four characters represents a 6-bit index in the base64 characters list // which, when concatenated, will give the 24-bit number for the original 3 characters var n = (base64inv[s.charAt(c)] << 18) + (base64inv[s.charAt(c+1)] << 12) + (base64inv[s.charAt(c+2)] << 6) + base64inv[s.charAt(c+3)]; // split the 24-bit number into the original three 8-bit (ASCII) characters r.push((n >>> 16) & 255); r.push((n >>> 8) & 255); r.push(n & 255); } // remove any zero pad that was added to make this a multiple of 24 bits return r; } What's the function of those "<<<" and "" characters. Or is there a function like this for Python?

    Read the article

  • How to make gvfs-smb always use UTF8

    - by Didier
    The problem is that if I make an entry in /etc/fstab to mount a samba share I can give the option iocharset=utf8 and this then mounts the share correctly and in the right encoding with special characters displayed correctly. Gnomes automount system for some reason never gets this right and I can find nowhere to change its settings. Is there a way to make it always use UTF8 by default? This is with Ubuntu 10.10 and the system can display the characters involved.

    Read the article

  • ASP.NET MVC URL decode

    - by Bryan
    I have an action like this: <%=Html.ActionLink("My_link", "About", "Home", new RouteValueDictionary { { "id", "Österreich" } }, null)%> This produces the following link: http://localhost:1855/Home/About/%C3%96sterreich I want a link which looks like this - localhost:1855/Home/About/Österreich I have tried. Server.HtmlDecode("Österreich") HttpUtility.UrlDecode("Österreich") Neither seems to be helping. What else can I try to get my desired result?

    Read the article

  • Decode sparse json array to php array

    - by Isaac Sutherland
    I can create a sparse php array (or map) using the command: $myarray = array(10=>'hi','test20'=>'howdy'); I want to serialize/deserialize this as JSON. I can serialize it using the command: $json = json_encode($myarray); which results in the string {"10":"hi","test20":"howdy"}. However, when I deserialize this and cast it to an array using the command: $mynewarray = (array)json_decode($json); I seem to lose any mappings with keys which were not valid php identifiers. That is, mynewarray has mapping 'test20'=>'howdy', but not 10=>'hi' nor '10'=>'hi'. Is there a way to preserve the numerical keys in a php map when converting to and back from json using the standard json_encode / json_decode functions?

    Read the article

  • How to decode Google spreadsheet's Json respose as a Php Array

    - by Mohammad
    My google Docs Spreadsheet call returns this response in the json format (I only need everything after "rows") please look at the formatted response here : ) I use php's json_decode function to parse the data and use it (Yes, I am awful at php) This code returns NULL, and according to the documentation, NULL is returned "if the json cannot be decoded". $json = file_get_contents($jsonurl); $json_output = json_decode($json); var_dump ($json_output); // Returns NULL Basically, what i want to accomplish is to make a simple array from the first row values of the Json response. like this $array = {'john','John Handcock','[email protected]','2929292','blanc'} You guys are genius, I would appreciate your insight and help on this very much! Answer as "sberry2A" mentions bellow, the response is not valid Json, google offers the Zend Json library for this purpose, tho I decided to parse the tsv-excel version instead :)

    Read the article

  • Javascript and PHP how should I en/decode my data

    - by Ron
    Hello everyone. I whould like to know in what encryption should I encode my data and why first of all, I use GET method because it is search engine inside website. second, I use RTL language (hebrew) and thrid which basically is why I ask this question - firefox and safari (as I understood) encode and decode urls automaticly so if I encoded url, in firefox I will see it decoded which is good but if I copy-paste the url to the address bar and than enter the site firefox encode the uncoded url to utf (i think). anyway, what en/decode should I use, and how can I overcome the firefox auto en/decode?

    Read the article

  • Decoding mysql_real_escape_string() for outputting HTML

    - by Peter
    I'm trying to protect myself from sql injection and am using: mysql_real_escape_string($string); When posting HTML it looks something like this: <span class="\&quot;className\&quot;"> <p class="\&quot;pClass\&quot;" id="\&quot;pId\&quot;"></p> </span> I'm not sure how many other variations real_escape_string adds so don't want to just replace a few and miss others... How do I "decode" this back into correctly formatted HTML, with something like: html_entity_decode(stripslashes($string));

    Read the article

  • How to set utf8 in the auto-generated PHP code of flash builder 4 ?

    - by Mark
    HI, PHP problem here (I think): I've just created a Flex (Flash Builder) project with a datagrid linked to a database - the database is all utf8. When I run the project using the auto-generated code in flex4, the non-English part comes like ????? while the English part comes fine. The auto-generated PHP code uses mysqli. I've tried: $this->connection->set_charset('utf8'); or mysqli_query($this->connection,"SET NAMES utf8"); I also tried writing the code myself (I'm not a PHP guy): mysql_query("set names utf8"); was fine - but that's mysql and not mysqli (that's an "i" after the mysql) and I want to use the auto-generated code... any help is appreciated.

    Read the article

  • Applying languages / locale selectively: is it possible?

    - by Aron Rotteveel
    I am a Dutch user and prefer the my local date & time format, system wide. I have no trouble speaking or understanding English and find it very useful to have the rest of my system configured in English to make my life easier when I need to Google a term, for example. Is it possible to apply the a local date/time/currency/etc. format to the system, while maintaining English menu & dialog captions? EDIT: output from locale and posted screens of current settings: LANG=en_US.utf8 LANGUAGE=en LC_CTYPE="en_US.utf8" LC_NUMERIC="en_US.utf8" LC_TIME="en_US.utf8" LC_COLLATE="en_US.utf8" LC_MONETARY="en_US.utf8" LC_MESSAGES="en_US.utf8" LC_PAPER="en_US.utf8" LC_NAME="en_US.utf8" LC_ADDRESS="en_US.utf8" LC_TELEPHONE="en_US.utf8" LC_MEASUREMENT="en_US.utf8" LC_IDENTIFICATION="en_US.utf8" LC_ALL=

    Read the article

  • How to decode a JSON String with several objects in PHP?

    - by ilnur777
    Hi, guys! I know how to decode a JSON string with one object with your help from this example http://stackoverflow.com/questions/2543389/how-to-decode-a-json-string But now I would like to improve decoding JSON string with several objects and I can't understand how to do it. Here is an example: { "programmers": [ { "firstName": "Brett", "lastName":"McLaughlin" }, { "firstName": "Jason", "lastName":"Hunter" }, { "firstName": "Elliotte", "lastName":"Harold" } ], "authors": [ { "firstName": "Isaac", "lastName": "Asimov" }, { "firstName": "Tad", "lastName": "Williams" }, { "firstName": "Frank", "lastName": "Peretti" } ], "musicians": [ { "firstName": "Eric", "lastName": "Clapton" }, { "firstName": "Sergei", "lastName": "Rachmaninoff" } ] } How to decode this JSON, call data and display on the page from what object the informartion list is being read? Thank you!

    Read the article

  • Inconsistent get_class_methods vs method_exists when using UTF8 characters in PHP code

    - by coma
    I have this class in a UTF-8 encoded file called EnUTF8.Class.php: class EnUTF8 { public function ñññ() { return 'ñññ()'; } } and in another UTF-8 encoded file: require_once('EnUTF8.Class.php'); require_once('OneBuggy.Class.php'); $utf8 = new EnUTF8(); //$buggy = new OneBuggy(); echo (method_exists($utf8, 'ñññ')) ? 'ñññ() exists!' : 'ñññ() does not exist...'; echo "\n\n----------------------------------\n\n" print_r(get_class_methods($utf8)); echo "\n----------------------------------\n\n" echo $utf8->ñññ(); that produces the expected result: ñññ() exists! ---------------------------------- Array ( [0] => ñññ ) ---------------------------------- ñññ() but if... require_once('EnUTF8.Class.php'); require_once('OneBuggy.Class.php'); $utf8 = new EnUTF8(); $buggy = new OneBuggy(); echo (method_exists($utf8, 'ñññ')) ? 'ñññ() exists!' : 'ñññ() does not exist...'; echo "\n\n----------------------------------\n\n" print_r(get_class_methods($utf8)); echo "\n----------------------------------\n\n" echo $utf8->ñññ(); then the weirdness appears!!!: ñññ() does not exist! ---------------------------------- Array ( [0] => ñññ ) ---------------------------------- Fatal error: Call to undefined method EnUTF8::ñññ() in /var/www/test.php on line 16 Well, the thing is that OneBuggy.Class.php is UTF-8 encoded too and shares absolutly nothing with EnUTF8.Class.php so... where is the bug? UPDATED: Well, after a long debugging time I found this in OneBuggy.Class.php constructor: setlocale (LC_ALL, "es_ES@euro", "es_ES", "esp"); so I did... //setlocale (LC_ALL, "es_ES@euro", "es_ES", "esp"); and now it works but why?.

    Read the article

  • How can I convert data encoded in WE8MSWIN1252 to utf8 for use in Python scripts?

    - by James Dean
    This data comes from an Oracle database and is extracted to flatfiles in encoding 'WE8MSWIN1252'. I want to parse the data and do some analysis. I want to see the text fields but do not need to publish the results to any other system so if some characters do not get converted perfectly I do not have a problem with that. I just do not want my parsing to fail with a decode error which is what I get if I use: inputFile = codecs.open( dataFileName, "r", "utf-8'")

    Read the article

  • How can I decode UTF-16 data in Perl?

    - by Geo
    If I open a file ( and specify an encoding directly ) : open(my $file,"<:encoding(UTF-16)","some.file") || die "error $!\n"; while(<$file>) { print "$_\n"; } close($file); I can read the file contents nicely. However, if I do: use Encode; open(my $file,"some.file") || die "error $!\n"; while(<$file>) { print decode("UTF-16",$_); } close($file); I get the following error: UTF-16:Unrecognised BOM d at F:/Perl/lib/Encode.pm line 174 How can I make it work with decode?

    Read the article

  • How can I decode UTF-16 data in Perl when I don't know the byte order?

    - by Geo
    If I open a file ( and specify an encoding directly ) : open(my $file,"<:encoding(UTF-16)","some.file") || die "error $!\n"; while(<$file>) { print "$_\n"; } close($file); I can read the file contents nicely. However, if I do: use Encode; open(my $file,"some.file") || die "error $!\n"; while(<$file>) { print decode("UTF-16",$_); } close($file); I get the following error: UTF-16:Unrecognised BOM d at F:/Perl/lib/Encode.pm line 174 How can I make it work with decode?

    Read the article

  • SQLite, python, unicode, and non-utf data

    - by Nathan Spears
    I started by trying to store strings in sqlite using python, and got the message: sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings. Ok, I switched to Unicode strings. Then I started getting the message: sqlite3.OperationalError: Could not decode to UTF-8 column 'tag_artist' with text 'Sigur Rós' when trying to retrieve data from the db. More research and I started encoding it in utf8, but then 'Sigur Rós' starts looking like 'Sigur Rós' note: My console was set to display in 'latin_1' as @John Machin pointed out. What gives? After reading this, describing exactly the same situation I'm in, it seems as if the advice is to ignore the other advice and use 8-bit bytestrings after all. I didn't know much about unicode and utf before I started this process. I've learned quite a bit in the last couple hours, but I'm still ignorant of whether there is a way to correctly convert 'ó' from latin-1 to utf-8 and not mangle it. If there isn't, why would sqlite 'highly recommend' I switch my application to unicode strings? I'm going to update this question with a summary and some example code of everything I've learned in the last 24 hours so that someone in my shoes can have an easy(er) guide. If the information I post is wrong or misleading in any way please tell me and I'll update, or one of you senior guys can update. Summary of answers Let me first state the goal as I understand it. The goal in processing various encodings, if you are trying to convert between them, is to understand what your source encoding is, then convert it to unicode using that source encoding, then convert it to your desired encoding. Unicode is a base and encodings are mappings of subsets of that base. utf_8 has room for every character in unicode, but because they aren't in the same place as, for instance, latin_1, a string encoded in utf_8 and sent to a latin_1 console will not look the way you expect. In python the process of getting to unicode and into another encoding looks like: str.decode('source_encoding').encode('desired_encoding') or if the str is already in unicode str.encode('desired_encoding') For sqlite I didn't actually want to encode it again, I wanted to decode it and leave it in unicode format. Here are four things you might need to be aware of as you try to work with unicode and encodings in python. The encoding of the string you want to work with, and the encoding you want to get it to. The system encoding. The console encoding. The encoding of the source file Elaboration: (1) When you read a string from a source, it must have some encoding, like latin_1 or utf_8. In my case, I'm getting strings from filenames, so unfortunately, I could be getting any kind of encoding. Windows XP uses UCS-2 (a Unicode system) as its native string type, which seems like cheating to me. Fortunately for me, the characters in most filenames are not going to be made up of more than one source encoding type, and I think all of mine were either completely latin_1, completely utf_8, or just plain ascii (which is a subset of both of those). So I just read them and decoded them as if they were still in latin_1 or utf_8. It's possible, though, that you could have latin_1 and utf_8 and whatever other characters mixed together in a filename on Windows. Sometimes those characters can show up as boxes, other times they just look mangled, and other times they look correct (accented characters and whatnot). Moving on. (2) Python has a default system encoding that gets set when python starts and can't be changed during runtime. See here for details. Dirty summary ... well here's the file I added: \# sitecustomize.py \# this file can be anywhere in your Python path, \# but it usually goes in ${pythondir}/lib/site-packages/ import sys sys.setdefaultencoding('utf_8') This system encoding is the one that gets used when you use the unicode("str") function without any other encoding parameters. To say that another way, python tries to decode "str" to unicode based on the default system encoding. (3) If you're using IDLE or the command-line python, I think that your console will display according to the default system encoding. I am using pydev with eclipse for some reason, so I had to go into my project settings, edit the launch configuration properties of my test script, go to the Common tab, and change the console from latin-1 to utf-8 so that I could visually confirm what I was doing was working. (4) If you want to have some test strings, eg test_str = "ó" in your source code, then you will have to tell python what kind of encoding you are using in that file. (FYI: when I mistyped an encoding I had to ctrl-Z because my file became unreadable.) This is easily accomplished by putting a line like so at the top of your source code file: # -*- coding: utf_8 -*- If you don't have this information, python attempts to parse your code as ascii by default, and so: SyntaxError: Non-ASCII character '\xf3' in file _redacted_ on line 81, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details Once your program is working correctly, or, if you aren't using python's console or any other console to look at output, then you will probably really only care about #1 on the list. System default and console encoding are not that important unless you need to look at output and/or you are using the builtin unicode() function (without any encoding parameters) instead of the string.decode() function. I wrote a demo function I will paste into the bottom of this gigantic mess that I hope correctly demonstrates the items in my list. Here is some of the output when I run the character 'ó' through the demo function, showing how various methods react to the character as input. My system encoding and console output are both set to utf_8 for this run: '?' = original char <type 'str'> repr(char)='\xf3' '?' = unicode(char) ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data 'ó' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Now I will change the system and console encoding to latin_1, and I get this output for the same input: 'ó' = original char <type 'str'> repr(char)='\xf3' 'ó' = unicode(char) <type 'unicode'> repr(unicode(char))=u'\xf3' 'ó' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Notice that the 'original' character displays correctly and the builtin unicode() function works now. Now I change my console output back to utf_8. '?' = original char <type 'str'> repr(char)='\xf3' '?' = unicode(char) <type 'unicode'> repr(unicode(char))=u'\xf3' '?' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Here everything still works the same as last time but the console can't display the output correctly. Etc. The function below also displays more information that this and hopefully would help someone figure out where the gap in their understanding is. I know all this information is in other places and more thoroughly dealt with there, but I hope that this would be a good kickoff point for someone trying to get coding with python and/or sqlite. Ideas are great but sometimes source code can save you a day or two of trying to figure out what functions do what. Disclaimers: I'm no encoding expert, I put this together to help my own understanding. I kept building on it when I should have probably started passing functions as arguments to avoid so much redundant code, so if I can I'll make it more concise. Also, utf_8 and latin_1 are by no means the only encoding schemes, they are just the two I was playing around with because I think they handle everything I need. Add your own encoding schemes to the demo function and test your own input. One more thing: there are apparently crazy application developers making life difficult in Windows. #!/usr/bin/env python # -*- coding: utf_8 -*- import os import sys def encodingDemo(str): validStrings = () try: print "str =",str,"{0} repr(str) = {1}".format(type(str), repr(str)) validStrings += ((str,""),) except UnicodeEncodeError as ude: print "Couldn't print the str itself because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print ude try: x = unicode(str) print "unicode(str) = ",x validStrings+= ((x, " decoded into unicode by the default system encoding"),) except UnicodeDecodeError as ude: print "ERROR. unicode(str) couldn't decode the string because the system encoding is set to an encoding that doesn't understand some character in the string." print "\tThe system encoding is set to {0}. See error:\n\t".format(sys.getdefaultencoding()), print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the unicode(str) because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print uee try: x = str.decode('latin_1') print "str.decode('latin_1') =",x validStrings+= ((x, " decoded with latin_1 into unicode"),) try: print "str.decode('latin_1').encode('utf_8') =",str.decode('latin_1').encode('utf_8') validStrings+= ((x, " decoded with latin_1 into unicode and encoded into utf_8"),) except UnicodeDecodeError as ude: print "The string was decoded into unicode using the latin_1 encoding, but couldn't be encoded into utf_8. See error:\n\t", print ude except UnicodeDecodeError as ude: print "Something didn't work, probably because the string wasn't latin_1 encoded. See error:\n\t", print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the str.decode('latin_1') because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print uee try: x = str.decode('utf_8') print "str.decode('utf_8') =",x validStrings+= ((x, " decoded with utf_8 into unicode"),) try: print "str.decode('utf_8').encode('latin_1') =",str.decode('utf_8').encode('latin_1') except UnicodeDecodeError as ude: print "str.decode('utf_8').encode('latin_1') didn't work. The string was decoded into unicode using the utf_8 encoding, but couldn't be encoded into latin_1. See error:\n\t", validStrings+= ((x, " decoded with utf_8 into unicode and encoded into latin_1"),) print ude except UnicodeDecodeError as ude: print "str.decode('utf_8') didn't work, probably because the string wasn't utf_8 encoded. See error:\n\t", print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the str.decode('utf_8') because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t",uee print print "Printing information about each character in the original string." for char in str: try: print "\t'" + char + "' = original char {0} repr(char)={1}".format(type(char), repr(char)) except UnicodeDecodeError as ude: print "\t'?' = original char {0} repr(char)={1} ERROR PRINTING: {2}".format(type(char), repr(char), ude) except UnicodeEncodeError as uee: print "\t'?' = original char {0} repr(char)={1} ERROR PRINTING: {2}".format(type(char), repr(char), uee) print uee try: x = unicode(char) print "\t'" + x + "' = unicode(char) {1} repr(unicode(char))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = unicode(char) ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = unicode(char) {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) try: x = char.decode('latin_1') print "\t'" + x + "' = char.decode('latin_1') {1} repr(char.decode('latin_1'))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = char.decode('latin_1') ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = char.decode('latin_1') {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) try: x = char.decode('utf_8') print "\t'" + x + "' = char.decode('utf_8') {1} repr(char.decode('utf_8'))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = char.decode('utf_8') ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = char.decode('utf_8') {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) print x = 'ó' encodingDemo(x) Much thanks for the answers below and especially to @John Machin for answering so thoroughly.

    Read the article

  • SQLite, python, unicode, and non-utf data

    - by Nathan Spears
    I started by trying to store strings in sqlite using python, and got the message: sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings. Ok, I switched to Unicode strings. Then I started getting the message: sqlite3.OperationalError: Could not decode to UTF-8 column 'tag_artist' with text 'Sigur Rós' when trying to retrieve data from the db. More research and I started encoding it in utf8, but then 'Sigur Rós' starts looking like 'Sigur Rós' note: My console was set to display in 'latin_1' as @John Machin pointed out. What gives? After reading this, describing exactly the same situation I'm in, it seems as if the advice is to ignore the other advice and use 8-bit bytestrings after all. I didn't know much about unicode and utf before I started this process. I've learned quite a bit in the last couple hours, but I'm still ignorant of whether there is a way to correctly convert 'ó' from latin-1 to utf-8 and not mangle it. If there isn't, why would sqlite 'highly recommend' I switch my application to unicode strings? I'm going to update this question with a summary and some example code of everything I've learned in the last 24 hours so that someone in my shoes can have an easy(er) guide. If the information I post is wrong or misleading in any way please tell me and I'll update, or one of you senior guys can update. Summary of answers Let me first state the goal as I understand it. The goal in processing various encodings, if you are trying to convert between them, is to understand what your source encoding is, then convert it to unicode using that source encoding, then convert it to your desired encoding. Unicode is a base and encodings are mappings of subsets of that base. utf_8 has room for every character in unicode, but because they aren't in the same place as, for instance, latin_1, a string encoded in utf_8 and sent to a latin_1 console will not look the way you expect. In python the process of getting to unicode and into another encoding looks like: str.decode('source_encoding').encode('desired_encoding') or if the str is already in unicode str.encode('desired_encoding') For sqlite I didn't actually want to encode it again, I wanted to decode it and leave it in unicode format. Here are four things you might need to be aware of as you try to work with unicode and encodings in python. The encoding of the string you want to work with, and the encoding you want to get it to. The system encoding. The console encoding. The encoding of the source file Elaboration: (1) When you read a string from a source, it must have some encoding, like latin_1 or utf_8. In my case, I'm getting strings from filenames, so unfortunately, I could be getting any kind of encoding. Windows XP uses UCS-2 (a Unicode system) as its native string type, which seems like cheating to me. Fortunately for me, the characters in most filenames are not going to be made up of more than one source encoding type, and I think all of mine were either completely latin_1, completely utf_8, or just plain ascii (which is a subset of both of those). So I just read them and decoded them as if they were still in latin_1 or utf_8. It's possible, though, that you could have latin_1 and utf_8 and whatever other characters mixed together in a filename on Windows. Sometimes those characters can show up as boxes, other times they just look mangled, and other times they look correct (accented characters and whatnot). Moving on. (2) Python has a default system encoding that gets set when python starts and can't be changed during runtime. See here for details. Dirty summary ... well here's the file I added: \# sitecustomize.py \# this file can be anywhere in your Python path, \# but it usually goes in ${pythondir}/lib/site-packages/ import sys sys.setdefaultencoding('utf_8') This system encoding is the one that gets used when you use the unicode("str") function without any other encoding parameters. To say that another way, python tries to decode "str" to unicode based on the default system encoding. (3) If you're using IDLE or the command-line python, I think that your console will display according to the default system encoding. I am using pydev with eclipse for some reason, so I had to go into my project settings, edit the launch configuration properties of my test script, go to the Common tab, and change the console from latin-1 to utf-8 so that I could visually confirm what I was doing was working. (4) If you want to have some test strings, eg test_str = "ó" in your source code, then you will have to tell python what kind of encoding you are using in that file. (FYI: when I mistyped an encoding I had to ctrl-Z because my file became unreadable.) This is easily accomplished by putting a line like so at the top of your source code file: # -*- coding: utf_8 -*- If you don't have this information, python attempts to parse your code as ascii by default, and so: SyntaxError: Non-ASCII character '\xf3' in file _redacted_ on line 81, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details Once your program is working correctly, or, if you aren't using python's console or any other console to look at output, then you will probably really only care about #1 on the list. System default and console encoding are not that important unless you need to look at output and/or you are using the builtin unicode() function (without any encoding parameters) instead of the string.decode() function. I wrote a demo function I will paste into the bottom of this gigantic mess that I hope correctly demonstrates the items in my list. Here is some of the output when I run the character 'ó' through the demo function, showing how various methods react to the character as input. My system encoding and console output are both set to utf_8 for this run: '?' = original char <type 'str'> repr(char)='\xf3' '?' = unicode(char) ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data 'ó' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Now I will change the system and console encoding to latin_1, and I get this output for the same input: 'ó' = original char <type 'str'> repr(char)='\xf3' 'ó' = unicode(char) <type 'unicode'> repr(unicode(char))=u'\xf3' 'ó' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Notice that the 'original' character displays correctly and the builtin unicode() function works now. Now I change my console output back to utf_8. '?' = original char <type 'str'> repr(char)='\xf3' '?' = unicode(char) <type 'unicode'> repr(unicode(char))=u'\xf3' '?' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Here everything still works the same as last time but the console can't display the output correctly. Etc. The function below also displays more information that this and hopefully would help someone figure out where the gap in their understanding is. I know all this information is in other places and more thoroughly dealt with there, but I hope that this would be a good kickoff point for someone trying to get coding with python and/or sqlite. Ideas are great but sometimes source code can save you a day or two of trying to figure out what functions do what. Disclaimers: I'm no encoding expert, I put this together to help my own understanding. I kept building on it when I should have probably started passing functions as arguments to avoid so much redundant code, so if I can I'll make it more concise. Also, utf_8 and latin_1 are by no means the only encoding schemes, they are just the two I was playing around with because I think they handle everything I need. Add your own encoding schemes to the demo function and test your own input. One more thing: there are apparently crazy application developers making life difficult in Windows. #!/usr/bin/env python # -*- coding: utf_8 -*- import os import sys def encodingDemo(str): validStrings = () try: print "str =",str,"{0} repr(str) = {1}".format(type(str), repr(str)) validStrings += ((str,""),) except UnicodeEncodeError as ude: print "Couldn't print the str itself because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print ude try: x = unicode(str) print "unicode(str) = ",x validStrings+= ((x, " decoded into unicode by the default system encoding"),) except UnicodeDecodeError as ude: print "ERROR. unicode(str) couldn't decode the string because the system encoding is set to an encoding that doesn't understand some character in the string." print "\tThe system encoding is set to {0}. See error:\n\t".format(sys.getdefaultencoding()), print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the unicode(str) because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print uee try: x = str.decode('latin_1') print "str.decode('latin_1') =",x validStrings+= ((x, " decoded with latin_1 into unicode"),) try: print "str.decode('latin_1').encode('utf_8') =",str.decode('latin_1').encode('utf_8') validStrings+= ((x, " decoded with latin_1 into unicode and encoded into utf_8"),) except UnicodeDecodeError as ude: print "The string was decoded into unicode using the latin_1 encoding, but couldn't be encoded into utf_8. See error:\n\t", print ude except UnicodeDecodeError as ude: print "Something didn't work, probably because the string wasn't latin_1 encoded. See error:\n\t", print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the str.decode('latin_1') because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print uee try: x = str.decode('utf_8') print "str.decode('utf_8') =",x validStrings+= ((x, " decoded with utf_8 into unicode"),) try: print "str.decode('utf_8').encode('latin_1') =",str.decode('utf_8').encode('latin_1') except UnicodeDecodeError as ude: print "str.decode('utf_8').encode('latin_1') didn't work. The string was decoded into unicode using the utf_8 encoding, but couldn't be encoded into latin_1. See error:\n\t", validStrings+= ((x, " decoded with utf_8 into unicode and encoded into latin_1"),) print ude except UnicodeDecodeError as ude: print "str.decode('utf_8') didn't work, probably because the string wasn't utf_8 encoded. See error:\n\t", print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the str.decode('utf_8') because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t",uee print print "Printing information about each character in the original string." for char in str: try: print "\t'" + char + "' = original char {0} repr(char)={1}".format(type(char), repr(char)) except UnicodeDecodeError as ude: print "\t'?' = original char {0} repr(char)={1} ERROR PRINTING: {2}".format(type(char), repr(char), ude) except UnicodeEncodeError as uee: print "\t'?' = original char {0} repr(char)={1} ERROR PRINTING: {2}".format(type(char), repr(char), uee) print uee try: x = unicode(char) print "\t'" + x + "' = unicode(char) {1} repr(unicode(char))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = unicode(char) ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = unicode(char) {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) try: x = char.decode('latin_1') print "\t'" + x + "' = char.decode('latin_1') {1} repr(char.decode('latin_1'))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = char.decode('latin_1') ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = char.decode('latin_1') {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) try: x = char.decode('utf_8') print "\t'" + x + "' = char.decode('utf_8') {1} repr(char.decode('utf_8'))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = char.decode('utf_8') ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = char.decode('utf_8') {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) print x = 'ó' encodingDemo(x) Much thanks for the answers below and especially to @John Machin for answering so thoroughly.

    Read the article

  • How to pass UTF8 string into your PHP HTML API?

    - by Ole Jak
    so I have my php API (html Get api for Flash builder and C# apps). So if you want to submit data to it you use string like http://localhost/cms/api.php?method=someMethod&string=Your_String If there are english letters in it its ok. But what if I need to pass UTF-8 string like this &#1056;&#1091;&#1089;&#1089;&#1082;&#1086;&#1077; &#1048;&#1084;&#1103; to my api what shall I do?

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >