Search Results

Search found 5333 results on 214 pages for 'chunked encoding'.

Page 19/214 | < Previous Page | 15 16 17 18 19 20 21 22 23 24 25 26  | Next Page >

  • Encoding multiple video streams with a single avconv invocation

    - by automatthias
    I played with avconv on Ubuntu and I'm now able to e.g. record the desktop with sound from a soundcard. One thing I wanted to do was recording two video inputs at the same time, for instance the desktop and from the webcam. I thought about doing something like this: avconv \ -f alsa \ -i default \ -acodec flac \ -f video4linux2 \ -r 6 \ -i /dev/video0 \ -f x11grab \ -i :0.0 \ out.mkv My thinking was that if you define multiple video inputs, and the .mkv format can handle multiple video streams, avconv will encode 2 video streams and 1 audio stream into one file. But this isn't what happens: avconv version 0.8.4-6:0.8.4-0ubuntu0.12.10.1, Copyright (c) 2000-2012 the Libav developers built on Nov 6 2012 16:51:11 with gcc 4.7.2 [alsa @ 0x1091bc0] capture with some ALSA plugins, especially dsnoop, may hang. [alsa @ 0x1091bc0] Estimating duration from bitrate, this may be inaccurate Input #0, alsa, from 'default': Duration: N/A, start: 1354364317.020350, bitrate: N/A Stream #0.0: Audio: pcm_s16le, 48000 Hz, 2 channels, s16, 1536 kb/s [video4linux2 @ 0x10923e0] Estimating duration from bitrate, this may be inaccurate Input #1, video4linux2, from '/dev/video0': Duration: N/A, start: 100607.724745, bitrate: 29491 kb/s Stream #1.0: Video: rawvideo, yuyv422, 640x480, 29491 kb/s, 6 tbr, 1000k tbn, 6 tbc [x11grab @ 0x107b2a0] device: :0.0+83,87 -> display: :0.0 x: 83 y: 87 width: 854 height: 480 [x11grab @ 0x107b2a0] shared memory extension found [x11grab @ 0x107b2a0] Estimating duration from bitrate, this may be inaccurate Input #2, x11grab, from ':0.0+83,87': Duration: N/A, start: 1354364318.488382, bitrate: 196761 kb/s Stream #2.0: Video: rawvideo, bgra, 854x480, 196761 kb/s, 15 tbr, 1000k tbn, 15 tbc Incompatible pixel format 'bgra' for codec 'mpeg4', auto-selecting format 'yuv420p' [buffer @ 0x107fcc0] w:854 h:480 pixfmt:bgra [avsink @ 0x10bdf00] auto-inserting filter 'auto-inserted scaler 0' between the filter 'src' and the filter 'out' [scale @ 0x10dc680] w:854 h:480 fmt:bgra -> w:854 h:480 fmt:yuv420p flags:0x4 Output #0, matroska, to '.../out.mkv': Metadata: encoder : Lavf53.21.0 Stream #0.0: Video: mpeg4, yuv420p, 854x480, q=2-31, 4000 kb/s, 1k tbn, 15 tbc Stream #0.1: Audio: libvorbis, 48000 Hz, 2 channels, s16 Stream mapping: Stream #2:0 -> #0:0 (rawvideo -> mpeg4) Stream #0:0 -> #0:1 (pcm_s16le -> libvorbis) Press ctrl-c to stop encoding [mpeg4 @ 0x10bd800] rc buffer underflow ^Cframe= 160 fps= 15 q=2.0 Lsize= 3414kB time=10.66 bitrate=2623.0kbits/s video:3273kB audio:131kB global headers:4kB muxing overhead 0.165600% Received signal 2: terminating. I'm not sure if it's the question of mapping (some -map options to add?) or that avconv just can't encode more than 1 video stream at one time. So is it an actual avconv limitation, or a limitation of the available containers, or me simply not finding the right combination of command line options?

    Read the article

  • SQLite, python, unicode, and non-utf data

    - by Nathan Spears
    I started by trying to store strings in sqlite using python, and got the message: sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings. Ok, I switched to Unicode strings. Then I started getting the message: sqlite3.OperationalError: Could not decode to UTF-8 column 'tag_artist' with text 'Sigur Rós' when trying to retrieve data from the db. More research and I started encoding it in utf8, but then 'Sigur Rós' starts looking like 'Sigur Rós' note: My console was set to display in 'latin_1' as @John Machin pointed out. What gives? After reading this, describing exactly the same situation I'm in, it seems as if the advice is to ignore the other advice and use 8-bit bytestrings after all. I didn't know much about unicode and utf before I started this process. I've learned quite a bit in the last couple hours, but I'm still ignorant of whether there is a way to correctly convert 'ó' from latin-1 to utf-8 and not mangle it. If there isn't, why would sqlite 'highly recommend' I switch my application to unicode strings? I'm going to update this question with a summary and some example code of everything I've learned in the last 24 hours so that someone in my shoes can have an easy(er) guide. If the information I post is wrong or misleading in any way please tell me and I'll update, or one of you senior guys can update. Summary of answers Let me first state the goal as I understand it. The goal in processing various encodings, if you are trying to convert between them, is to understand what your source encoding is, then convert it to unicode using that source encoding, then convert it to your desired encoding. Unicode is a base and encodings are mappings of subsets of that base. utf_8 has room for every character in unicode, but because they aren't in the same place as, for instance, latin_1, a string encoded in utf_8 and sent to a latin_1 console will not look the way you expect. In python the process of getting to unicode and into another encoding looks like: str.decode('source_encoding').encode('desired_encoding') or if the str is already in unicode str.encode('desired_encoding') For sqlite I didn't actually want to encode it again, I wanted to decode it and leave it in unicode format. Here are four things you might need to be aware of as you try to work with unicode and encodings in python. The encoding of the string you want to work with, and the encoding you want to get it to. The system encoding. The console encoding. The encoding of the source file Elaboration: (1) When you read a string from a source, it must have some encoding, like latin_1 or utf_8. In my case, I'm getting strings from filenames, so unfortunately, I could be getting any kind of encoding. Windows XP uses UCS-2 (a Unicode system) as its native string type, which seems like cheating to me. Fortunately for me, the characters in most filenames are not going to be made up of more than one source encoding type, and I think all of mine were either completely latin_1, completely utf_8, or just plain ascii (which is a subset of both of those). So I just read them and decoded them as if they were still in latin_1 or utf_8. It's possible, though, that you could have latin_1 and utf_8 and whatever other characters mixed together in a filename on Windows. Sometimes those characters can show up as boxes, other times they just look mangled, and other times they look correct (accented characters and whatnot). Moving on. (2) Python has a default system encoding that gets set when python starts and can't be changed during runtime. See here for details. Dirty summary ... well here's the file I added: \# sitecustomize.py \# this file can be anywhere in your Python path, \# but it usually goes in ${pythondir}/lib/site-packages/ import sys sys.setdefaultencoding('utf_8') This system encoding is the one that gets used when you use the unicode("str") function without any other encoding parameters. To say that another way, python tries to decode "str" to unicode based on the default system encoding. (3) If you're using IDLE or the command-line python, I think that your console will display according to the default system encoding. I am using pydev with eclipse for some reason, so I had to go into my project settings, edit the launch configuration properties of my test script, go to the Common tab, and change the console from latin-1 to utf-8 so that I could visually confirm what I was doing was working. (4) If you want to have some test strings, eg test_str = "ó" in your source code, then you will have to tell python what kind of encoding you are using in that file. (FYI: when I mistyped an encoding I had to ctrl-Z because my file became unreadable.) This is easily accomplished by putting a line like so at the top of your source code file: # -*- coding: utf_8 -*- If you don't have this information, python attempts to parse your code as ascii by default, and so: SyntaxError: Non-ASCII character '\xf3' in file _redacted_ on line 81, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details Once your program is working correctly, or, if you aren't using python's console or any other console to look at output, then you will probably really only care about #1 on the list. System default and console encoding are not that important unless you need to look at output and/or you are using the builtin unicode() function (without any encoding parameters) instead of the string.decode() function. I wrote a demo function I will paste into the bottom of this gigantic mess that I hope correctly demonstrates the items in my list. Here is some of the output when I run the character 'ó' through the demo function, showing how various methods react to the character as input. My system encoding and console output are both set to utf_8 for this run: '?' = original char <type 'str'> repr(char)='\xf3' '?' = unicode(char) ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data 'ó' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Now I will change the system and console encoding to latin_1, and I get this output for the same input: 'ó' = original char <type 'str'> repr(char)='\xf3' 'ó' = unicode(char) <type 'unicode'> repr(unicode(char))=u'\xf3' 'ó' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Notice that the 'original' character displays correctly and the builtin unicode() function works now. Now I change my console output back to utf_8. '?' = original char <type 'str'> repr(char)='\xf3' '?' = unicode(char) <type 'unicode'> repr(unicode(char))=u'\xf3' '?' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Here everything still works the same as last time but the console can't display the output correctly. Etc. The function below also displays more information that this and hopefully would help someone figure out where the gap in their understanding is. I know all this information is in other places and more thoroughly dealt with there, but I hope that this would be a good kickoff point for someone trying to get coding with python and/or sqlite. Ideas are great but sometimes source code can save you a day or two of trying to figure out what functions do what. Disclaimers: I'm no encoding expert, I put this together to help my own understanding. I kept building on it when I should have probably started passing functions as arguments to avoid so much redundant code, so if I can I'll make it more concise. Also, utf_8 and latin_1 are by no means the only encoding schemes, they are just the two I was playing around with because I think they handle everything I need. Add your own encoding schemes to the demo function and test your own input. One more thing: there are apparently crazy application developers making life difficult in Windows. #!/usr/bin/env python # -*- coding: utf_8 -*- import os import sys def encodingDemo(str): validStrings = () try: print "str =",str,"{0} repr(str) = {1}".format(type(str), repr(str)) validStrings += ((str,""),) except UnicodeEncodeError as ude: print "Couldn't print the str itself because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print ude try: x = unicode(str) print "unicode(str) = ",x validStrings+= ((x, " decoded into unicode by the default system encoding"),) except UnicodeDecodeError as ude: print "ERROR. unicode(str) couldn't decode the string because the system encoding is set to an encoding that doesn't understand some character in the string." print "\tThe system encoding is set to {0}. See error:\n\t".format(sys.getdefaultencoding()), print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the unicode(str) because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print uee try: x = str.decode('latin_1') print "str.decode('latin_1') =",x validStrings+= ((x, " decoded with latin_1 into unicode"),) try: print "str.decode('latin_1').encode('utf_8') =",str.decode('latin_1').encode('utf_8') validStrings+= ((x, " decoded with latin_1 into unicode and encoded into utf_8"),) except UnicodeDecodeError as ude: print "The string was decoded into unicode using the latin_1 encoding, but couldn't be encoded into utf_8. See error:\n\t", print ude except UnicodeDecodeError as ude: print "Something didn't work, probably because the string wasn't latin_1 encoded. See error:\n\t", print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the str.decode('latin_1') because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print uee try: x = str.decode('utf_8') print "str.decode('utf_8') =",x validStrings+= ((x, " decoded with utf_8 into unicode"),) try: print "str.decode('utf_8').encode('latin_1') =",str.decode('utf_8').encode('latin_1') except UnicodeDecodeError as ude: print "str.decode('utf_8').encode('latin_1') didn't work. The string was decoded into unicode using the utf_8 encoding, but couldn't be encoded into latin_1. See error:\n\t", validStrings+= ((x, " decoded with utf_8 into unicode and encoded into latin_1"),) print ude except UnicodeDecodeError as ude: print "str.decode('utf_8') didn't work, probably because the string wasn't utf_8 encoded. See error:\n\t", print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the str.decode('utf_8') because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t",uee print print "Printing information about each character in the original string." for char in str: try: print "\t'" + char + "' = original char {0} repr(char)={1}".format(type(char), repr(char)) except UnicodeDecodeError as ude: print "\t'?' = original char {0} repr(char)={1} ERROR PRINTING: {2}".format(type(char), repr(char), ude) except UnicodeEncodeError as uee: print "\t'?' = original char {0} repr(char)={1} ERROR PRINTING: {2}".format(type(char), repr(char), uee) print uee try: x = unicode(char) print "\t'" + x + "' = unicode(char) {1} repr(unicode(char))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = unicode(char) ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = unicode(char) {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) try: x = char.decode('latin_1') print "\t'" + x + "' = char.decode('latin_1') {1} repr(char.decode('latin_1'))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = char.decode('latin_1') ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = char.decode('latin_1') {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) try: x = char.decode('utf_8') print "\t'" + x + "' = char.decode('utf_8') {1} repr(char.decode('utf_8'))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = char.decode('utf_8') ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = char.decode('utf_8') {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) print x = 'ó' encodingDemo(x) Much thanks for the answers below and especially to @John Machin for answering so thoroughly.

    Read the article

  • SQLite, python, unicode, and non-utf data

    - by Nathan Spears
    I started by trying to store strings in sqlite using python, and got the message: sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings. Ok, I switched to Unicode strings. Then I started getting the message: sqlite3.OperationalError: Could not decode to UTF-8 column 'tag_artist' with text 'Sigur Rós' when trying to retrieve data from the db. More research and I started encoding it in utf8, but then 'Sigur Rós' starts looking like 'Sigur Rós' note: My console was set to display in 'latin_1' as @John Machin pointed out. What gives? After reading this, describing exactly the same situation I'm in, it seems as if the advice is to ignore the other advice and use 8-bit bytestrings after all. I didn't know much about unicode and utf before I started this process. I've learned quite a bit in the last couple hours, but I'm still ignorant of whether there is a way to correctly convert 'ó' from latin-1 to utf-8 and not mangle it. If there isn't, why would sqlite 'highly recommend' I switch my application to unicode strings? I'm going to update this question with a summary and some example code of everything I've learned in the last 24 hours so that someone in my shoes can have an easy(er) guide. If the information I post is wrong or misleading in any way please tell me and I'll update, or one of you senior guys can update. Summary of answers Let me first state the goal as I understand it. The goal in processing various encodings, if you are trying to convert between them, is to understand what your source encoding is, then convert it to unicode using that source encoding, then convert it to your desired encoding. Unicode is a base and encodings are mappings of subsets of that base. utf_8 has room for every character in unicode, but because they aren't in the same place as, for instance, latin_1, a string encoded in utf_8 and sent to a latin_1 console will not look the way you expect. In python the process of getting to unicode and into another encoding looks like: str.decode('source_encoding').encode('desired_encoding') or if the str is already in unicode str.encode('desired_encoding') For sqlite I didn't actually want to encode it again, I wanted to decode it and leave it in unicode format. Here are four things you might need to be aware of as you try to work with unicode and encodings in python. The encoding of the string you want to work with, and the encoding you want to get it to. The system encoding. The console encoding. The encoding of the source file Elaboration: (1) When you read a string from a source, it must have some encoding, like latin_1 or utf_8. In my case, I'm getting strings from filenames, so unfortunately, I could be getting any kind of encoding. Windows XP uses UCS-2 (a Unicode system) as its native string type, which seems like cheating to me. Fortunately for me, the characters in most filenames are not going to be made up of more than one source encoding type, and I think all of mine were either completely latin_1, completely utf_8, or just plain ascii (which is a subset of both of those). So I just read them and decoded them as if they were still in latin_1 or utf_8. It's possible, though, that you could have latin_1 and utf_8 and whatever other characters mixed together in a filename on Windows. Sometimes those characters can show up as boxes, other times they just look mangled, and other times they look correct (accented characters and whatnot). Moving on. (2) Python has a default system encoding that gets set when python starts and can't be changed during runtime. See here for details. Dirty summary ... well here's the file I added: \# sitecustomize.py \# this file can be anywhere in your Python path, \# but it usually goes in ${pythondir}/lib/site-packages/ import sys sys.setdefaultencoding('utf_8') This system encoding is the one that gets used when you use the unicode("str") function without any other encoding parameters. To say that another way, python tries to decode "str" to unicode based on the default system encoding. (3) If you're using IDLE or the command-line python, I think that your console will display according to the default system encoding. I am using pydev with eclipse for some reason, so I had to go into my project settings, edit the launch configuration properties of my test script, go to the Common tab, and change the console from latin-1 to utf-8 so that I could visually confirm what I was doing was working. (4) If you want to have some test strings, eg test_str = "ó" in your source code, then you will have to tell python what kind of encoding you are using in that file. (FYI: when I mistyped an encoding I had to ctrl-Z because my file became unreadable.) This is easily accomplished by putting a line like so at the top of your source code file: # -*- coding: utf_8 -*- If you don't have this information, python attempts to parse your code as ascii by default, and so: SyntaxError: Non-ASCII character '\xf3' in file _redacted_ on line 81, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details Once your program is working correctly, or, if you aren't using python's console or any other console to look at output, then you will probably really only care about #1 on the list. System default and console encoding are not that important unless you need to look at output and/or you are using the builtin unicode() function (without any encoding parameters) instead of the string.decode() function. I wrote a demo function I will paste into the bottom of this gigantic mess that I hope correctly demonstrates the items in my list. Here is some of the output when I run the character 'ó' through the demo function, showing how various methods react to the character as input. My system encoding and console output are both set to utf_8 for this run: '?' = original char <type 'str'> repr(char)='\xf3' '?' = unicode(char) ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data 'ó' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Now I will change the system and console encoding to latin_1, and I get this output for the same input: 'ó' = original char <type 'str'> repr(char)='\xf3' 'ó' = unicode(char) <type 'unicode'> repr(unicode(char))=u'\xf3' 'ó' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Notice that the 'original' character displays correctly and the builtin unicode() function works now. Now I change my console output back to utf_8. '?' = original char <type 'str'> repr(char)='\xf3' '?' = unicode(char) <type 'unicode'> repr(unicode(char))=u'\xf3' '?' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Here everything still works the same as last time but the console can't display the output correctly. Etc. The function below also displays more information that this and hopefully would help someone figure out where the gap in their understanding is. I know all this information is in other places and more thoroughly dealt with there, but I hope that this would be a good kickoff point for someone trying to get coding with python and/or sqlite. Ideas are great but sometimes source code can save you a day or two of trying to figure out what functions do what. Disclaimers: I'm no encoding expert, I put this together to help my own understanding. I kept building on it when I should have probably started passing functions as arguments to avoid so much redundant code, so if I can I'll make it more concise. Also, utf_8 and latin_1 are by no means the only encoding schemes, they are just the two I was playing around with because I think they handle everything I need. Add your own encoding schemes to the demo function and test your own input. One more thing: there are apparently crazy application developers making life difficult in Windows. #!/usr/bin/env python # -*- coding: utf_8 -*- import os import sys def encodingDemo(str): validStrings = () try: print "str =",str,"{0} repr(str) = {1}".format(type(str), repr(str)) validStrings += ((str,""),) except UnicodeEncodeError as ude: print "Couldn't print the str itself because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print ude try: x = unicode(str) print "unicode(str) = ",x validStrings+= ((x, " decoded into unicode by the default system encoding"),) except UnicodeDecodeError as ude: print "ERROR. unicode(str) couldn't decode the string because the system encoding is set to an encoding that doesn't understand some character in the string." print "\tThe system encoding is set to {0}. See error:\n\t".format(sys.getdefaultencoding()), print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the unicode(str) because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print uee try: x = str.decode('latin_1') print "str.decode('latin_1') =",x validStrings+= ((x, " decoded with latin_1 into unicode"),) try: print "str.decode('latin_1').encode('utf_8') =",str.decode('latin_1').encode('utf_8') validStrings+= ((x, " decoded with latin_1 into unicode and encoded into utf_8"),) except UnicodeDecodeError as ude: print "The string was decoded into unicode using the latin_1 encoding, but couldn't be encoded into utf_8. See error:\n\t", print ude except UnicodeDecodeError as ude: print "Something didn't work, probably because the string wasn't latin_1 encoded. See error:\n\t", print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the str.decode('latin_1') because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print uee try: x = str.decode('utf_8') print "str.decode('utf_8') =",x validStrings+= ((x, " decoded with utf_8 into unicode"),) try: print "str.decode('utf_8').encode('latin_1') =",str.decode('utf_8').encode('latin_1') except UnicodeDecodeError as ude: print "str.decode('utf_8').encode('latin_1') didn't work. The string was decoded into unicode using the utf_8 encoding, but couldn't be encoded into latin_1. See error:\n\t", validStrings+= ((x, " decoded with utf_8 into unicode and encoded into latin_1"),) print ude except UnicodeDecodeError as ude: print "str.decode('utf_8') didn't work, probably because the string wasn't utf_8 encoded. See error:\n\t", print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the str.decode('utf_8') because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t",uee print print "Printing information about each character in the original string." for char in str: try: print "\t'" + char + "' = original char {0} repr(char)={1}".format(type(char), repr(char)) except UnicodeDecodeError as ude: print "\t'?' = original char {0} repr(char)={1} ERROR PRINTING: {2}".format(type(char), repr(char), ude) except UnicodeEncodeError as uee: print "\t'?' = original char {0} repr(char)={1} ERROR PRINTING: {2}".format(type(char), repr(char), uee) print uee try: x = unicode(char) print "\t'" + x + "' = unicode(char) {1} repr(unicode(char))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = unicode(char) ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = unicode(char) {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) try: x = char.decode('latin_1') print "\t'" + x + "' = char.decode('latin_1') {1} repr(char.decode('latin_1'))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = char.decode('latin_1') ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = char.decode('latin_1') {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) try: x = char.decode('utf_8') print "\t'" + x + "' = char.decode('utf_8') {1} repr(char.decode('utf_8'))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = char.decode('utf_8') ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = char.decode('utf_8') {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) print x = 'ó' encodingDemo(x) Much thanks for the answers below and especially to @John Machin for answering so thoroughly.

    Read the article

  • Why does text from Assembly.GetManifestResourceStream() start with three junk characters?

    - by flipdoubt
    I have a SQL file added to my VS.NET 2008 project as an embedded resource. Whenever I use the following code to read the file's content, the string returned always starts with three junk characters and then the text I expect. I assume this has something to do with the Encoding.Default I am using, but that is just a guess. Why does this text keep showing up? Should I just trim off the first three characters or is there a more informed approach? public string GetUpdateRestoreSchemaScript() { var type = GetType(); var a = Assembly.GetAssembly(type); var script = "UpdateRestoreSchema.sql"; var resourceName = String.Concat(type.Namespace, ".", script); using(Stream stream = a.GetManifestResourceStream(resourceName)) { byte[] buffer = new byte[stream.Length]; stream.Read(buffer, 0, buffer.Length); // UPDATE: Should be Encoding.UTF8 return Encoding.Default.GetString(buffer); } } Update: I now know that my code works as expected if I simply change the last line to return a UTF-8 encoded string. It will always be true for this embedded file, but will it always be true? Is there a way to test any buffer to determine its encoding?

    Read the article

  • Perl strings internals

    - by n0rd
    How does perl strings represented internally? What encoding is used? How do I handle different encodings properly? I've been using perl for quite a long time, but it didn't include a lot of string handling in different encodings, and when I encountered a minor problem that had something to do with encodings I usually resorted to some shamanic actions. Until this moment I thought about perl strings as sequences of bytes, which did fit pretty well for my tasks. Now I need to do some processing of UTF-8 encoded file and here starts trouble. First, I read file into string like this: open(my $in, '<', $ARGV[0]) or die "cannot open file $ARGV[0] for reading"; binmode($in, ':utf8'); my $contents; { local $/; $contents = <$in>; } close($in); then simply print it: print $contents; And I get two things: a warning Wide character in print at <scriptname> line <n> and a garbage in console. So I can conclude that perl strings have a concept of "character" that can be "wide" or not, but when printed these "wide" characters are represented in console as multiple bytes, not as single "character". (I wonder now why did all my previous experience with binary files worked quite how I expected it to work without any "character" issues). Why then I see garbage in console? If perl stores strings as character in some known encoding, I don't think there is a big problem to find out console encoding and print text properly. (I use Windows, BTW). If perl stores strings as multibyte sequences (e.g. using same UTF-8 encoding), why is it done this way? From my C experience handling multibyte strings is PAIN.

    Read the article

  • How is a relative JMP (x86) implemented in an Assembler?

    - by Pindatjuh
    While building my assembler for the x86 platform I encountered some problems with encoding the JMP instruction: enc inst size in bytes EB cb JMP rel8 2 E9 cw JMP rel16 4 (because of 0x66 16-bit prefix) E9 cd JMP rel32 5 ... (from my favourite x86 instruction website, http://siyobik.info/index.php?module=x86&id=147) All are relative jumps, where the size of each encoding (operation + operand) is in the third column. Now my original (and thus fault because of this) design reserved the maximum (5 bytes) space for each instruction. The operand is not yet known, because it's a jump to a yet unknown location. So I've implemented a "rewrite" mechanism, that rewrites the operands in the correct location in memory, if the location of the jump is known, and fills the rest with NOPs. This is a somewhat serious concern in tight-loops. Now my problem is with the following situation: b: XXX c: JMP a e: XXX ... XXX d: JMP b a: XXX (where XXX is any instruction, depending on the to-be assembled program) The problem is that I want the smallest possible encoding for a JMP instruction (and no NOP filling). I have to know the size of the instruction at c before I can calculate the relative distance between a and b for the operand at d. The same applies for the JMP at c: it needs to know the size of d before it can calculate the relative distance between e and a. How do existing assemblers implement this, or how would you implement this? This is what I am thinking which solves the problem: First encode all the instructions to opcodes between the JMP and it's target, and if this region contains a variable-sized opcode, use the maximum size, i.e. 5 for JMP. Then in some conditions, the JMP is oversized (because it may fit in a smaller encoding): so another pass will search for oversized JMPs, shrink them, and move all instructions ahead), and set absolute branching instructions (i.e. external CALLs) after this pass is completed. I wonder, perhaps this is an over-engineered solution, that's why I ask this question.

    Read the article

  • How to workaround Python "WindowsError messages are not properly encoded" problem?

    - by Victor Lin
    It's a trouble when Python raised a WindowsError, the encoding of message of the exception is always os-native-encoded. For example: import os os.remove('does_not_exist.file') Well, here we get an exception: Traceback (most recent call last): File "<stdin>", line 1, in <module> WindowsError: [Error 2] ???????????: 'does_not_exist.file' As the language of my Windows7 is Traditional Chinese, the default error message I get is in big5 encoding (as know as CP950). >>> try: ... os.remove('abc.file') ... except WindowsError, value: ... print value.args ... (2, '\xa8t\xb2\xce\xa7\xe4\xa4\xa3\xa8\xec\xab\xfc\xa9w\xaa\xba\xc0\xc9\xae\xd7\xa1C') >>> As you see here, error message is not Unicode, then I will get another encoding exception when I try to print it out. Here is the issue, it can be found in Python issue list: http://bugs.python.org/issue1754 The question is, how to workaround this? How to get the native encoding of WindowsError? The version of Python I use is 2.6. Thanks.

    Read the article

  • Foreign/accented characters in sql query

    - by FromCanada
    I'm using Java and Spring's JdbcTemplate class to build an SQL query in Java that queries a Postgres database. However, I'm having trouble executing queries that contain foreign/accented characters. For example the (trimmed) code: JdbcTemplate select = new JdbcTemplate( postgresDatabase ); String query = "SELECT id FROM province WHERE name = 'Ontario';"; Integer id = select.queryForObject( query, Integer.class ); will retrieve the province id, but if instead I did name = 'Québec' then the query fails to return any results (this value is in the database so the problem isn't that it's missing). I believe the source of the problem is that the database I am required to use has the default client encoding set to SQL_ASCII, which according to this prevents automatic character set conversions. (The Java environments encoding is set to 'UTF-8' while I'm told the database uses 'LATIN1' / 'ISO-8859-1') I was able to manually indicate the encoding when the resultSets contained values with foreign characters as a solution to a previous problem with a similar nature. Ex: String provinceName = new String ( resultSet.getBytes( "name" ), "ISO-8859-1" ); But now that the foreign characters are part of the query itself this approach hasn't been successful. (I suppose since the query has to be saved in a String before being executed anyway, breaking it down into bytes and then changing the encoding only muddles the characters further.) Is there a way around this without having to change the properties of the database or reconstruct it? PostScript: I found this function on StackOverflow when making up a title, it didn't seem to work (I might not have used it correctly, but even if it did work it doesn't seem like it could be the best solution.):

    Read the article

  • requesting ajax via HttpWebRequest

    - by Sami Abdelgadir Mohammed
    Hi guys: I'm writing a simple application that will download some piece of data from a website then I can use it later for any purpose The following is the request and response copied from Firebug as the browser did that... when u type http://x5.travian.com.sa/ajax.php?f=k7&x=18&y=-186&xx=12&yy=-192 you will get a php file has some data.. But when I make a request with HttpWebRequest I get wrong data (some unknown letters) Can anyone help me in that.. and if I have to make some encodings or what?? I will be so appreciated.. Response Server nginx Date Tue, 04 Jan 2011 23:03:49 GMT Content-Type application/json; charset=UTF-8 Transfer-Encoding chunked Connection keep-alive X-Powered-By PHP/5.2.8 Expires Mon, 26 Jul 1997 05:00:00 GMT Last-Modified Tue, 04 Jan 2011 23:03:49 GMT Cache-Control no-store, no-cache, must-revalidate, post-check=0, pre-check=0 Pragma no-cache Content-Encoding gzip Vary Accept-Encoding Request Host x5.travian.com.sa User-Agent Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 Accept text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8 Accept-Language en-us,en;q=0.5 Accept-Encoding gzip,deflate Accept-Charset ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive 115 Connection keep-alive Cookie CAD=57878984%231292375897%230%230%23%230; T3E=%3DImYykTN2EzMmhjO5QTM2QDN2oDM1ITOyoDOxIjM4EDN5ITM6gjO4MDOxIWZyQWMipTZu9metl2ctl2c6MDNxADN6MDNxADNjMDNxADNjMDNxADN; orderby_b1=0; orderby_b=0; orderby2=0; orderby=0

    Read the article

  • NSString to NSData Failing in Encoding

    - by Travis
    I'm trying to use NSXmlParser to parse ISO-8859-1 data. Using Apple's own example for parsing ISO-8859-1, I have the following. NSString *xmlFilePath = [[NSBundle mainBundle] pathForResource:sampleFileName ofType:@"xml"]; NSString *xmlFileContents = [NSString stringWithContentsOfFile:xmlFilePath encoding:NSISOLatin1StringEncoding error:nil]; NSLog(@"contents: %@", xmlFileContents); I see that in the console, the contents of the string is accurate. However when I try to convert it to an NSData object (for use with the parser), I do the following. NSData *xmlData = [xmlFileContents dataUsingEncoding:NSISOLatin1StringEncoding]; But then when my didStartElement delegate gets called, I see  showing up which I think is from an encoding discrepancy. Can NSXmlParser handle ISO-8859-1 and if so, what am I doing wrong?

    Read the article

  • Unable to encode to iso-8859-1 encoding for some chars using Perl Encode module

    - by ppant
    I have a HTML string in ISO-8859-1 encoding. I need to pass this string to HTML:Entities::decode_entities() for converting some of the HTML ASCII codes to respective chars. To so i am using a module HTML::Parser::Entities 3.65 but after decode_entities() operation my whole string changes to utf-8 string. This behavior seems fine as the documentation of the HTML::Parse. As i need this string back in ISO-8859-1 format for further processing so i have used Encode::encode("iso-8859-1",$str) to change the string back to ISO-8859-1 encoding. My results are fine excepts for some chars, a question mark is coming instead. One example is single quote ' ASCII code (’) Can anybody help me if there any limitation of Encode module? Any other pointer will also be helpful to solve the problem. Thanks

    Read the article

  • invalid token error while parsing an XML file with UTF-8 encoding

    - by Niranjan
    invalid token error while parsing an XML file with UTF-8 encoding. This error is coming when it encountered extended ASCII character 'â' { "â", "â" }. When I have changed the encoding from UTF-8 to ISO-8859-1 the parsing is successful. But my application should support UTF-8, ASCII and extended ASCII characters. What should I do for this? Any ideas are welcome. Thanks in Advance for your time and solution.

    Read the article

  • Can not set Character Encoding using sun-web.xml

    - by stck777
    I am trying to send special characters like spanish chars from my page to a JSP page as a form parameter. When I try get the parameter which I sent, it shows that as "?" (Question mark). After searching on java.net thread I came to know that I should have following entry in my sun-web.xml <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE sun-web-app PUBLIC "-//Sun Microsystems, Inc.//DTD Sun ONE Application Server 8.0 Servlet 2.4//EN" "http://www.sun.com/software/sunone/appserver/dtds/sun-web-app_2_4-0.dtd"> <sun-web-app> <locale-charset-info default-locale="es"> <locale-charset-map locale="es" charset="UTF-8"/> <parameter-encoding default-charset="UTF-8"/> </locale-charset-info> </sun-web-app> But it did not work with this approach still the character goes as "?".

    Read the article

  • Tomcat Compression Does Not Add a Content-Encoding: gzip in the Header

    - by Julien Chastang
    I am using Tomcat to compress my HTML content like this: <Connector port="8080" maxHttpHeaderSize="8192" maxProcessors="150" maxThreads="150" minSpareThreads="25" maxSpareThreads="75" enableLookups="false" redirectPort="8443" acceptCount="150" connectionTimeout="20000" disableUploadTimeout="true" compression="on" compressionMinSize="128" noCompressionUserAgents="gozilla, traviata" compressableMimeType="text/html" URIEncoding="UTF-8" /> In the HTTP header (as observed via YSlow), however, I am not seeing Content-Encoding: gzip resulting in a poor YSlow score. All I see is HeadersPost Response Headers Server: Apache-Coyote/1.1 Content-Type: text/html;charset=ISO-8859-1 Content-Language: en-US Content-Length: 5251 Date: Sat, 14 Feb 2009 23:33:51 GMT I am running an apache mod_jk Tomcat configuration. How do I compress HTML content with Tomcat, and also have it add "Content-Encoding: gzip" in the header?

    Read the article

  • Intra-Unicode "lean" Encoding Converters

    - by Mystagogue
    Windows provides encoding conversion functions ("MultiByteToWideChar" and "WideCharToMultiByte") which are capable of UTF-8 to/from UTF-16 conversions, among other things. But I've seen people offer home-grown 30 to 40 line functions that claim also to perform UTF-8 / UTF-16 encoding conversions. My question is, how reliable are such tiny converters? Can such a tiny amount of code handle problems such as converting a UTF-16 surrogate pair (such as ) into a UTF-8 single four byte sequence (rather than making the mistake of converting into a pair of three byte sequences)? Can they correctly spot "unpaired" surrogate input, and provide an error? In short, are such tiny converters mere toys, or can they be taken seriously? For that matter, why does unicode.org seemingly offer no advice on an algorithm for accomplishing such conversions?

    Read the article

  • Syncronize an SVN repo (svnsync) with encoding errors

    - by Hamish
    Is it possible to fix/bypass non-UTF8 encoded svn:log records when syncronizing repositories with svnsync? Background I'm in the process of taking over the maintenance of an open source module that is stored within a large (well over 10,000 revisions) subversion (1.5.5) repository. I do not have admin access to the remote repository to dump/filter/load the module. The old repository is being discontinued and I am trying to sync the original sub module to my local (1.6+) repository with svnsync. For example: svnsync file://home/svn/temp-repo/ http://path.to.repo/modulename/ The problem is that the old repository didn't enforce UTF8 encoding and I'm hitting errors like: svnsync: Cannot accept 'svn:log' property because it is not encoded in UTF-8 I can't modify the log property in the source repository so I need to somehow modify or ignore the property value when the encoding is unknown/invalid. Any ideas? For example, is it possible to write a pre-revprop-change script to modify the log property in transit?

    Read the article

  • Php/ODBC encoding problem

    - by JohnM2
    I use ODBC to connect to SQL Server from PHP. In PHP I read some string (nvarchar column) data from SQL Server and then want to insert it to mysql database. When I try to insert such value to mysql database table I get this mysql error: Incorrect string value: '\xB3\xB9ow...' for column 'name' at row 1 For string with all ASCII characters everything is fine, the problem occurs when non-ASCII characters (from some European languages) exist. So, in more general terms: there is a Unicode string in MS SQL Server database, which is retrieved by PHP trough ODBC. Then it is put in sql insert query (as value for utf-8 varchar column) which is executed for mysql database. Can someone explain to me what is happening in this situation in terms of encoding? At which step what character encoding convertions may take place? I use: PHP 5.2.5, MySQL5.0.45-community-nt, MS Sql Server 2005.

    Read the article

  • Video encoding by servlet with MEncoder

    - by Andrey
    Hello. I was developing an application for video encoding on the server and got a problem with encoding video with MEncoder. This decoder doesn't work correctly when runned by a command line with Runtime.getRuntime().exec(“D:\mencoder\mnc\mencoder.exe video1.avi -o outvideo1.flv -of lavf -oac mp3lame -lameopts abr:br=64 -srate 22050 -ovc lavc -lavcopts vcodec=flv:vbitrate=300:mbd=2:mv0:trell:v4mv:cbp:last_pred=3 -vf scale=320:240,harddup -quiet”) ; The decoder launches and works in windows console with my parameters, but when it's run from a servlet it just hangs in process list and doesn't do anything before the web-server is stopped. When trying to use decoder from a simple java applcation, it runs correctly. Thanks for help.

    Read the article

  • Loading xml with encoding UTF 16 using XDocument

    - by Sangram
    Hi, I am trying to read the xml document using XDocument method . but i am getting an error when xml has <?xml version="1.0" encoding="utf-16"?> When i removed encoding manually.It works perfectly. I am getting error " There is no Unicode byte order mark. Cannot switch to Unicode. " i tried searching and i landed up here-- Why does C# XmlDocument.LoadXml(string) fail when an XML header is included? But could not solve my problem. My code : XDocument xdoc = XDocument.Load(path); Any suggestions ?? thank you.

    Read the article

  • FFmpeg + iPhone - Interesting (incorrect?) video encoding results

    - by jtrim
    I'm encoding some video on the iPhone by running the png image data through swscale to get YUV420P data then encoding that frame using the MSMPEG4V1 codec. In the api docs, avcodec_encode_video should return the number of bytes used from the output buffer by that encode operation. There are 234,000 bytes going into the encoder, but the result returned by avcodec_encode_video is simply "4". The result is exactly the same over 24 frames. Something seems fishy here...any insight? Here's a pastebin link to the code: http://pastebin.com/ht94FWva (sorry for the link away from SO, I just didn't want to have the code duplicated in several places) EDIT: Also, I've set up a custom log callback for ffmpeg to use and I have the log level set to "Verbose" (libavutil/log.h), so libavcodec should be logging any goofs to the console, but avcodec is quiet throught he whole operation. (note: I did test to make sure my log callback was working)

    Read the article

  • If a command line program is unsure of stdout's encoding, what encoding should it output?

    - by mackstann
    I have a command line program written in Python, and when I pipe it through another program on the command line, sys.stdout.encoding is None. This makes sense, I suppose -- the output could be another program, or a file you're redirecting it into, or whatever, and it doesn't know what encoding is desired. But neither do I! This program will be used by many different people (humor me) in different ways. Should I play it safe and output only ascii (replacing non-ascii chars with question marks)? Or should I output UTF-8, since it's so widespread these days?

    Read the article

  • Preserving multi-byte characters in Flex XML object

    - by Dan Petker
    I'm having an issue with the Flex XML object type mangling multi-byte characters (such as Japanese or Chinese characters). The basic setup is this. I'm getting an XML-formatted string from the server, and in that string there can be multi-byte characters. A lot of the time, these characters are in attributes, for example: <example id="foo" name="[some multi-byte characters]"/> Now, when I examine the raw string, the multi-byte characters display just fine. However, as soon as I convert the string to an XML object using the top-level XML() function, all the multi-byte characters become mangled. I've tried setting the XML's encoding by including an <?xml version="1.0" encoding="utf-8"?> element in the XML-formatted string, but this doesn't seem to have any effect on the resulting XML object. Is there a way to get the XML object to respect the encoding of the XML-formatted string and prevent the multi-byte characters from being mangled?

    Read the article

  • UTF-8 xml file shows Gibberish

    - by Adam
    I have a UTF-8 encoded xml file, which was exported from a Wordpress MySQL database. While the file is saved as UTF-8, and the encoding is UTF-8, I get gibberish instead of the Hebrew text that is supposed to be in there, which looks like this: ™×•×˜×•×ª How can I find the original encoding or charset and convert the text into proper Hebrew? PHP's mb_detect_encoding($str); returns UTF-8 Tried all sorts of php encoding functions, with different settings and input/output charsets, but they all just print different looking gibberish blocks, like: ÃâÃËÃâ¢Ãâ¢ÃËà and ?? ××©×ž× ...Any Ideas how to go about this?

    Read the article

< Previous Page | 15 16 17 18 19 20 21 22 23 24 25 26  | Next Page >