Search Results

Search found 9564 results on 383 pages for 'character encoding'.

Page 27/383 | < Previous Page | 23 24 25 26 27 28 29 30 31 32 33 34  | Next Page >

  • Import PDF in Inkscape without per character spacing

    - by Morinar
    I have a PDF form I imported into Inkscape. It did a pretty great job of created an SVG from it, however it appears to have given per character x coordinates to each tspan in every text block. I'm not sure, but I think this is making the FOP render that our software does ridiculously slow (running it through other renderers like IBEX it seems fine). I'd like to import it but have it not do per character positions. I can't seem to find any sort of PDF importing options at all. Does such a thing exist? Or is there perhaps some other, better freeware application I could use to generate the SVG from the PDF then use Inkscape to do adjustments from there? Thanks in advance.

    Read the article

  • How Can I Find Nth Character in Notepad++?

    - by Teno
    From the following text, I'd like to find a n th character. For example, the 10th character is "u"`. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque ac arcu sit amet lorem mollis dignissim ac ut metus. Aliquam sed nulla ut risus sollicitudin luctus vitae eget quam. Nam velit diam, ullamcorper id tempus ac, iaculis sed arcu. According to this page, \w{10,} would work but when I type it in the Find what field of the Findwindow, it produces the message, 'Can't find the text: "\w{10,}"' Thanks in advance.

    Read the article

  • Can't paste symbol from Character viewer in Mac to Photoshop

    - by Cristian
    This is extremely annoying: I want to paste in a special symbol in Photoshop using the Character Viewer in Mac. I understand that there are some fonts that do not support these symbols, but when I paste this symbol in Photoshop, it appears as an unknown symbol (similar to this: ? ), and I can't even change the font (to anything), it skips back to Myriad Pro. I found a stupid solution to this: using the arrows to change fonts - this seems to work, but still Photoshop doesn't show all fonts that are available in Character Viewer under Font Variation Have you experienced this? I can't remember if it was the same with Windows, but I know there were some bugs too.

    Read the article

  • How to prevent stretching, blurring and pixelating of embedded logos in VirtualDub?

    - by NoCanDo
    Howdy, take a look at this please http://www.youtube.com/watch?v=te-HVN8y_QE&hd=1 . Notice the embedded "logo" in the upper left corner? How blurry and pixelated it is? This is the original image: http://i.imagehost.org/0148/movie_watermark.png ! The stretching, blurring, pixelating etc. most likely comes from resizing the original video from 1920x1200 to 1280x720 and encoding it with h.264. Can anyone tell me how I can prevent the blurring, unsharpening and pixelating and retain their original quality? How do I exclude the logo from the whole encoding process and just slap it there in its original format and form?

    Read the article

  • Excel: Change all cells with one character to something else

    - by Allan
    Is there a formula I can use that will change all cells with one character to something else? For example, I have cells with single letters and no matter what the letter is I want that cell to contain the word Member. More Info: I get spreadsheets that contain, up to 40,000 rows. Column B will have names in the cells. Every once in a while a column will just have an initial instead of a full name. I'm looking for a way to change every single cell containing only one single character to the word "Member." The cells that need to change could be any letter but no matter what that letter is, if it's just a single letter in a cell, it needs to change to the word "Member."

    Read the article

  • Best practice for organizing/storing character/monster data in an RPG?

    - by eclecto
    Synopsis: Attempting to build a cross-platform RPG app in Adobe Flash Builder and am trying to figure out the best class hierarchy and the best way to store the static data used to build each of the individual "hero" and "monster" types. My programming experience, particularly in AS3, is embarrassingly small. My ultra-alpha method is to include a "_class" object in the constructor for each instance. The _class, in turn, is a static Object pulled from a class created specifically for that purpose, so things look something like this: // Character.as package { public class Character extends Sprite { public var _strength:int; // etc. public function Character(_class:Object) { _strength = _class._strength; // etc. } } } // MonsterClasses.as package { public final class MonsterClasses extends Object { public static const Monster1:Object={ _strength:50, // etc. } // etc. } } // Some other class in which characters/monsters are created. // Create a new instance of Character var myMonster = new Character(MonsterClasses.Monster1); Another option I've toyed with is the idea of making each character class/monster type its own subclass of Character, but I'm not sure if it would be efficient or even make sense considering that these classes would only be used to store variables and would add no new methods. On the other hand, it would make creating instances as simple as var myMonster = new Monster1; and potentially cut down on the overhead of having to read a class containing the data for, at a conservative preliminary estimate, over 150 monsters just to fish out the one monster I want (assuming, and I really have no idea, that such a thing might cause any kind of slowdown in execution). But long story short, I want a system that's both efficient at compile time and easy to work with during coding. Should I stick with what I've got or try a different method? As a subquestion, I'm also assuming here that the best way to store data that will be bundled with the final game and not read externally is simply to declare everything in AS3. Seems to me that if I used, say, XML or JSON I'd have to use the associated AS3 classes and methods to pull in the data, parse it, and convert it to AS3 object(s) anyway, so it would be inefficient. Right?

    Read the article

  • Postfix character encoding?

    - by Anonymous12345
    I use Postfix as a mailserver. I have Ubuntu OS. Then I use PHP to send emails. Problem is that none of my emails are encoded properly by a mailsoftware which my VPS provider uses. According to them, the problem lies with me. It is only the name field which isn't encoded properly. For example "Björn" becomes "Björn" in my emails. However, when I echo the $name, it outputs "Björn" which is correct. Also, gmail and hotmail does show it correctly. The strange part is that the "text" (the message itself) is encoded properly. I use the following for sending mail: $headers="MIME-Version: 1.0"."\n"; $headers.="Content-type: text/plain; charset=UTF-8"."\n"; $headers.="From: $name <$email>"."\n"; $name= iconv(mb_detect_encoding($name), "UTF-8//IGNORE//TRANSLIT", $name); //// I HAVE TRIED WITH AND WITHOUT THE LINE ABOVE, NO DIFFERENCE mail($to, '=?UTF-8?B?'.base64_encode($subject).'?=', $text, $headers, '[email protected]'); I have tried with and without the iconv line also, no luck. The last thing I can think of is POSTFIX, could there be a setting for character encoding there? Anybody knows?

    Read the article

  • Postfix character encoding?

    - by Camran
    I use Postfix as a mailserver. I have Ubuntu OS. Then I use PHP to send emails. Problem is that none of my emails are encoded properly by a mailsoftware which my VPS provider uses. According to them, the problem lies with me. It is only the name field which isn't encoded properly. For example "Björn" becomes "Björn" in my emails. However, when I echo the $name, it outputs "Björn" which is correct. Also, gmail and hotmail does show it correctly. The strange part is that the "text" (the message itself) is encoded properly. I use the following for sending mail: $headers="MIME-Version: 1.0"."\n"; $headers.="Content-type: text/plain; charset=UTF-8"."\n"; $headers.="From: $name <$email>"."\n"; $name= iconv(mb_detect_encoding($name), "UTF-8//IGNORE//TRANSLIT", $name); //// I HAVE TRIED WITH AND WITHOUT THE LINE ABOVE, NO DIFFERENCE mail($to, '=?UTF-8?B?'.base64_encode($subject).'?=', $text, $headers, '[email protected]'); I have tried with and without the iconv line also, no luck. The last thing I can think of is POSTFIX, could there be a setting for character encoding there? Anybody knows?

    Read the article

  • x86 instruction encoding tables

    - by Cheery
    I'm in middle of rewriting my assembler. While at it I'm curious about implementing disassembly as well. I want to make it simple and compact, and there's concepts I can exploit while doing so. It is possible to determine rest of the x86 instruction encoding from opcode (maybe prefix bytes are required too, a bit). I know many people have written tables for doing it. I'm not interested about mnemonics but instruction encoding, because it is an actual hard problem there. For each opcode number I need to know: does this instruction contain modrm? how many immediate fields does this instruction have? what encoding does an immediate use? is the immediate in field an instruction pointer -relative address? what kind of registers does the modrm use for operand and register fields? sandpile.org has somewhat quite much what I'd need, but it's in format that isn't easy to parse. Before I start writing and validating those tables myself, I decided to write this question. Do you know about this kind of tables existing somewhere? In a form that doesn't require too much effort to parse.

    Read the article

  • Using both chunked transfer encoding and gzip

    - by RadiantHeart
    I recently started using gzip on my site and it worked like charm on all browsers except Opera which gives an error saying it could not decompress the content due to damaged data. From what I can gather from testing and googling it might be a problem with using both gzip and chunked transfer encoding. The fact that there is no error when requesting small files like css-files also points in that direction. Is this a known issue or is there something else that I havent thought about? Someone also mentioned that it could have something to do with sending a Content-Length header. Here is a simplified version of the most relevant part of my code: $contents = ob_get_contents(); ob_end_clean(); header('Content-Encoding: '.$encoding); print("\x1f\x8b\x08\x00\x00\x00\x00\x00"); $size = strlen($contents); $contents = gzcompress($contents, 9); $contents = substr($contents, 0, $size); print($contents); exit();

    Read the article

  • Problems with utf-8 encoding in php

    - by Addsy
    Hi, Another utf-8 related problem I believe... I am using php to update data in a mysql db then display that data elsewhere in the site. Previously I have run into utf-8 problems before where special characters are displayed as question marks when viewed in a browser but this one seems slightly different. I have a number of records to enter that contain the è character. If I enter this directly in the db then it appears correctly on the page so I take this to mean that utf-8 content is being output correctly. However when I try and update the values in the db through php, then the è character is replaced. What appears instead is & Atilde ; & uml ; (without the spaces) which appears in the browser as è I have the tables in the database set to use UTF-8. I believe this is correct cos, as mentioned, if I update the db through phpMyAdmin, its all ok. Similarly I have set the character encoding for the page which seems to be correct. I am also running the sql statement "SET NAMES 'utf8';" before trying to update the db. Anyone have any other ideas as to where the problem may lie? Many thanks

    Read the article

  • Need help with PHP URL encoding/decoding

    - by Kenan
    On one page I'm "masking"/encoding URL which is passed to another page, there I decode URL and start file delivering to user. I found some function for encoding/decoding URL's, but sometime encoded URL contains "+" or "/" and decoded link is broken. I must use "folder structure" for link, can not use QueryString! Here is encoding function: $urll = 'SomeUrl.zip'; $key = '123'; $result = ''; for($i=0; $i<strlen($urll); $i++) { $char = substr($urll, $i, 1); $keychar = substr($key, ($i % strlen($key))-1, 1); $char = chr(ord($char)+ord($keychar)); $result.=$char; } $result = urlencode(base64_encode($result)); echo '<a href="/user/download/'.$result.'/">PC</a>'; Here is decoding: $urll = 'segment_3'; //Don't worry for this one its CMS retrieving 3rd "folder" $key = '123'; $resultt = ''; $string = ''; $string = base64_decode(urldecode($urll)); for($i=0; $i<strlen($string); $i++) { $char = substr($string, $i, 1); $keychar = substr($key, ($i % strlen($key))-1, 1); $char = chr(ord($char)-ord($keychar)); $resultt.=$char; } echo '<br />DEC: '. $resultt; So how to encode and decode url. Thanks

    Read the article

  • Encoding multiple video streams with a single avconv invocation

    - by automatthias
    I played with avconv on Ubuntu and I'm now able to e.g. record the desktop with sound from a soundcard. One thing I wanted to do was recording two video inputs at the same time, for instance the desktop and from the webcam. I thought about doing something like this: avconv \ -f alsa \ -i default \ -acodec flac \ -f video4linux2 \ -r 6 \ -i /dev/video0 \ -f x11grab \ -i :0.0 \ out.mkv My thinking was that if you define multiple video inputs, and the .mkv format can handle multiple video streams, avconv will encode 2 video streams and 1 audio stream into one file. But this isn't what happens: avconv version 0.8.4-6:0.8.4-0ubuntu0.12.10.1, Copyright (c) 2000-2012 the Libav developers built on Nov 6 2012 16:51:11 with gcc 4.7.2 [alsa @ 0x1091bc0] capture with some ALSA plugins, especially dsnoop, may hang. [alsa @ 0x1091bc0] Estimating duration from bitrate, this may be inaccurate Input #0, alsa, from 'default': Duration: N/A, start: 1354364317.020350, bitrate: N/A Stream #0.0: Audio: pcm_s16le, 48000 Hz, 2 channels, s16, 1536 kb/s [video4linux2 @ 0x10923e0] Estimating duration from bitrate, this may be inaccurate Input #1, video4linux2, from '/dev/video0': Duration: N/A, start: 100607.724745, bitrate: 29491 kb/s Stream #1.0: Video: rawvideo, yuyv422, 640x480, 29491 kb/s, 6 tbr, 1000k tbn, 6 tbc [x11grab @ 0x107b2a0] device: :0.0+83,87 -> display: :0.0 x: 83 y: 87 width: 854 height: 480 [x11grab @ 0x107b2a0] shared memory extension found [x11grab @ 0x107b2a0] Estimating duration from bitrate, this may be inaccurate Input #2, x11grab, from ':0.0+83,87': Duration: N/A, start: 1354364318.488382, bitrate: 196761 kb/s Stream #2.0: Video: rawvideo, bgra, 854x480, 196761 kb/s, 15 tbr, 1000k tbn, 15 tbc Incompatible pixel format 'bgra' for codec 'mpeg4', auto-selecting format 'yuv420p' [buffer @ 0x107fcc0] w:854 h:480 pixfmt:bgra [avsink @ 0x10bdf00] auto-inserting filter 'auto-inserted scaler 0' between the filter 'src' and the filter 'out' [scale @ 0x10dc680] w:854 h:480 fmt:bgra -> w:854 h:480 fmt:yuv420p flags:0x4 Output #0, matroska, to '.../out.mkv': Metadata: encoder : Lavf53.21.0 Stream #0.0: Video: mpeg4, yuv420p, 854x480, q=2-31, 4000 kb/s, 1k tbn, 15 tbc Stream #0.1: Audio: libvorbis, 48000 Hz, 2 channels, s16 Stream mapping: Stream #2:0 -> #0:0 (rawvideo -> mpeg4) Stream #0:0 -> #0:1 (pcm_s16le -> libvorbis) Press ctrl-c to stop encoding [mpeg4 @ 0x10bd800] rc buffer underflow ^Cframe= 160 fps= 15 q=2.0 Lsize= 3414kB time=10.66 bitrate=2623.0kbits/s video:3273kB audio:131kB global headers:4kB muxing overhead 0.165600% Received signal 2: terminating. I'm not sure if it's the question of mapping (some -map options to add?) or that avconv just can't encode more than 1 video stream at one time. So is it an actual avconv limitation, or a limitation of the available containers, or me simply not finding the right combination of command line options?

    Read the article

  • SQLite, python, unicode, and non-utf data

    - by Nathan Spears
    I started by trying to store strings in sqlite using python, and got the message: sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings. Ok, I switched to Unicode strings. Then I started getting the message: sqlite3.OperationalError: Could not decode to UTF-8 column 'tag_artist' with text 'Sigur Rós' when trying to retrieve data from the db. More research and I started encoding it in utf8, but then 'Sigur Rós' starts looking like 'Sigur Rós' note: My console was set to display in 'latin_1' as @John Machin pointed out. What gives? After reading this, describing exactly the same situation I'm in, it seems as if the advice is to ignore the other advice and use 8-bit bytestrings after all. I didn't know much about unicode and utf before I started this process. I've learned quite a bit in the last couple hours, but I'm still ignorant of whether there is a way to correctly convert 'ó' from latin-1 to utf-8 and not mangle it. If there isn't, why would sqlite 'highly recommend' I switch my application to unicode strings? I'm going to update this question with a summary and some example code of everything I've learned in the last 24 hours so that someone in my shoes can have an easy(er) guide. If the information I post is wrong or misleading in any way please tell me and I'll update, or one of you senior guys can update. Summary of answers Let me first state the goal as I understand it. The goal in processing various encodings, if you are trying to convert between them, is to understand what your source encoding is, then convert it to unicode using that source encoding, then convert it to your desired encoding. Unicode is a base and encodings are mappings of subsets of that base. utf_8 has room for every character in unicode, but because they aren't in the same place as, for instance, latin_1, a string encoded in utf_8 and sent to a latin_1 console will not look the way you expect. In python the process of getting to unicode and into another encoding looks like: str.decode('source_encoding').encode('desired_encoding') or if the str is already in unicode str.encode('desired_encoding') For sqlite I didn't actually want to encode it again, I wanted to decode it and leave it in unicode format. Here are four things you might need to be aware of as you try to work with unicode and encodings in python. The encoding of the string you want to work with, and the encoding you want to get it to. The system encoding. The console encoding. The encoding of the source file Elaboration: (1) When you read a string from a source, it must have some encoding, like latin_1 or utf_8. In my case, I'm getting strings from filenames, so unfortunately, I could be getting any kind of encoding. Windows XP uses UCS-2 (a Unicode system) as its native string type, which seems like cheating to me. Fortunately for me, the characters in most filenames are not going to be made up of more than one source encoding type, and I think all of mine were either completely latin_1, completely utf_8, or just plain ascii (which is a subset of both of those). So I just read them and decoded them as if they were still in latin_1 or utf_8. It's possible, though, that you could have latin_1 and utf_8 and whatever other characters mixed together in a filename on Windows. Sometimes those characters can show up as boxes, other times they just look mangled, and other times they look correct (accented characters and whatnot). Moving on. (2) Python has a default system encoding that gets set when python starts and can't be changed during runtime. See here for details. Dirty summary ... well here's the file I added: \# sitecustomize.py \# this file can be anywhere in your Python path, \# but it usually goes in ${pythondir}/lib/site-packages/ import sys sys.setdefaultencoding('utf_8') This system encoding is the one that gets used when you use the unicode("str") function without any other encoding parameters. To say that another way, python tries to decode "str" to unicode based on the default system encoding. (3) If you're using IDLE or the command-line python, I think that your console will display according to the default system encoding. I am using pydev with eclipse for some reason, so I had to go into my project settings, edit the launch configuration properties of my test script, go to the Common tab, and change the console from latin-1 to utf-8 so that I could visually confirm what I was doing was working. (4) If you want to have some test strings, eg test_str = "ó" in your source code, then you will have to tell python what kind of encoding you are using in that file. (FYI: when I mistyped an encoding I had to ctrl-Z because my file became unreadable.) This is easily accomplished by putting a line like so at the top of your source code file: # -*- coding: utf_8 -*- If you don't have this information, python attempts to parse your code as ascii by default, and so: SyntaxError: Non-ASCII character '\xf3' in file _redacted_ on line 81, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details Once your program is working correctly, or, if you aren't using python's console or any other console to look at output, then you will probably really only care about #1 on the list. System default and console encoding are not that important unless you need to look at output and/or you are using the builtin unicode() function (without any encoding parameters) instead of the string.decode() function. I wrote a demo function I will paste into the bottom of this gigantic mess that I hope correctly demonstrates the items in my list. Here is some of the output when I run the character 'ó' through the demo function, showing how various methods react to the character as input. My system encoding and console output are both set to utf_8 for this run: '?' = original char <type 'str'> repr(char)='\xf3' '?' = unicode(char) ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data 'ó' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Now I will change the system and console encoding to latin_1, and I get this output for the same input: 'ó' = original char <type 'str'> repr(char)='\xf3' 'ó' = unicode(char) <type 'unicode'> repr(unicode(char))=u'\xf3' 'ó' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Notice that the 'original' character displays correctly and the builtin unicode() function works now. Now I change my console output back to utf_8. '?' = original char <type 'str'> repr(char)='\xf3' '?' = unicode(char) <type 'unicode'> repr(unicode(char))=u'\xf3' '?' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Here everything still works the same as last time but the console can't display the output correctly. Etc. The function below also displays more information that this and hopefully would help someone figure out where the gap in their understanding is. I know all this information is in other places and more thoroughly dealt with there, but I hope that this would be a good kickoff point for someone trying to get coding with python and/or sqlite. Ideas are great but sometimes source code can save you a day or two of trying to figure out what functions do what. Disclaimers: I'm no encoding expert, I put this together to help my own understanding. I kept building on it when I should have probably started passing functions as arguments to avoid so much redundant code, so if I can I'll make it more concise. Also, utf_8 and latin_1 are by no means the only encoding schemes, they are just the two I was playing around with because I think they handle everything I need. Add your own encoding schemes to the demo function and test your own input. One more thing: there are apparently crazy application developers making life difficult in Windows. #!/usr/bin/env python # -*- coding: utf_8 -*- import os import sys def encodingDemo(str): validStrings = () try: print "str =",str,"{0} repr(str) = {1}".format(type(str), repr(str)) validStrings += ((str,""),) except UnicodeEncodeError as ude: print "Couldn't print the str itself because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print ude try: x = unicode(str) print "unicode(str) = ",x validStrings+= ((x, " decoded into unicode by the default system encoding"),) except UnicodeDecodeError as ude: print "ERROR. unicode(str) couldn't decode the string because the system encoding is set to an encoding that doesn't understand some character in the string." print "\tThe system encoding is set to {0}. See error:\n\t".format(sys.getdefaultencoding()), print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the unicode(str) because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print uee try: x = str.decode('latin_1') print "str.decode('latin_1') =",x validStrings+= ((x, " decoded with latin_1 into unicode"),) try: print "str.decode('latin_1').encode('utf_8') =",str.decode('latin_1').encode('utf_8') validStrings+= ((x, " decoded with latin_1 into unicode and encoded into utf_8"),) except UnicodeDecodeError as ude: print "The string was decoded into unicode using the latin_1 encoding, but couldn't be encoded into utf_8. See error:\n\t", print ude except UnicodeDecodeError as ude: print "Something didn't work, probably because the string wasn't latin_1 encoded. See error:\n\t", print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the str.decode('latin_1') because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print uee try: x = str.decode('utf_8') print "str.decode('utf_8') =",x validStrings+= ((x, " decoded with utf_8 into unicode"),) try: print "str.decode('utf_8').encode('latin_1') =",str.decode('utf_8').encode('latin_1') except UnicodeDecodeError as ude: print "str.decode('utf_8').encode('latin_1') didn't work. The string was decoded into unicode using the utf_8 encoding, but couldn't be encoded into latin_1. See error:\n\t", validStrings+= ((x, " decoded with utf_8 into unicode and encoded into latin_1"),) print ude except UnicodeDecodeError as ude: print "str.decode('utf_8') didn't work, probably because the string wasn't utf_8 encoded. See error:\n\t", print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the str.decode('utf_8') because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t",uee print print "Printing information about each character in the original string." for char in str: try: print "\t'" + char + "' = original char {0} repr(char)={1}".format(type(char), repr(char)) except UnicodeDecodeError as ude: print "\t'?' = original char {0} repr(char)={1} ERROR PRINTING: {2}".format(type(char), repr(char), ude) except UnicodeEncodeError as uee: print "\t'?' = original char {0} repr(char)={1} ERROR PRINTING: {2}".format(type(char), repr(char), uee) print uee try: x = unicode(char) print "\t'" + x + "' = unicode(char) {1} repr(unicode(char))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = unicode(char) ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = unicode(char) {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) try: x = char.decode('latin_1') print "\t'" + x + "' = char.decode('latin_1') {1} repr(char.decode('latin_1'))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = char.decode('latin_1') ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = char.decode('latin_1') {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) try: x = char.decode('utf_8') print "\t'" + x + "' = char.decode('utf_8') {1} repr(char.decode('utf_8'))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = char.decode('utf_8') ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = char.decode('utf_8') {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) print x = 'ó' encodingDemo(x) Much thanks for the answers below and especially to @John Machin for answering so thoroughly.

    Read the article

  • SQLite, python, unicode, and non-utf data

    - by Nathan Spears
    I started by trying to store strings in sqlite using python, and got the message: sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings. Ok, I switched to Unicode strings. Then I started getting the message: sqlite3.OperationalError: Could not decode to UTF-8 column 'tag_artist' with text 'Sigur Rós' when trying to retrieve data from the db. More research and I started encoding it in utf8, but then 'Sigur Rós' starts looking like 'Sigur Rós' note: My console was set to display in 'latin_1' as @John Machin pointed out. What gives? After reading this, describing exactly the same situation I'm in, it seems as if the advice is to ignore the other advice and use 8-bit bytestrings after all. I didn't know much about unicode and utf before I started this process. I've learned quite a bit in the last couple hours, but I'm still ignorant of whether there is a way to correctly convert 'ó' from latin-1 to utf-8 and not mangle it. If there isn't, why would sqlite 'highly recommend' I switch my application to unicode strings? I'm going to update this question with a summary and some example code of everything I've learned in the last 24 hours so that someone in my shoes can have an easy(er) guide. If the information I post is wrong or misleading in any way please tell me and I'll update, or one of you senior guys can update. Summary of answers Let me first state the goal as I understand it. The goal in processing various encodings, if you are trying to convert between them, is to understand what your source encoding is, then convert it to unicode using that source encoding, then convert it to your desired encoding. Unicode is a base and encodings are mappings of subsets of that base. utf_8 has room for every character in unicode, but because they aren't in the same place as, for instance, latin_1, a string encoded in utf_8 and sent to a latin_1 console will not look the way you expect. In python the process of getting to unicode and into another encoding looks like: str.decode('source_encoding').encode('desired_encoding') or if the str is already in unicode str.encode('desired_encoding') For sqlite I didn't actually want to encode it again, I wanted to decode it and leave it in unicode format. Here are four things you might need to be aware of as you try to work with unicode and encodings in python. The encoding of the string you want to work with, and the encoding you want to get it to. The system encoding. The console encoding. The encoding of the source file Elaboration: (1) When you read a string from a source, it must have some encoding, like latin_1 or utf_8. In my case, I'm getting strings from filenames, so unfortunately, I could be getting any kind of encoding. Windows XP uses UCS-2 (a Unicode system) as its native string type, which seems like cheating to me. Fortunately for me, the characters in most filenames are not going to be made up of more than one source encoding type, and I think all of mine were either completely latin_1, completely utf_8, or just plain ascii (which is a subset of both of those). So I just read them and decoded them as if they were still in latin_1 or utf_8. It's possible, though, that you could have latin_1 and utf_8 and whatever other characters mixed together in a filename on Windows. Sometimes those characters can show up as boxes, other times they just look mangled, and other times they look correct (accented characters and whatnot). Moving on. (2) Python has a default system encoding that gets set when python starts and can't be changed during runtime. See here for details. Dirty summary ... well here's the file I added: \# sitecustomize.py \# this file can be anywhere in your Python path, \# but it usually goes in ${pythondir}/lib/site-packages/ import sys sys.setdefaultencoding('utf_8') This system encoding is the one that gets used when you use the unicode("str") function without any other encoding parameters. To say that another way, python tries to decode "str" to unicode based on the default system encoding. (3) If you're using IDLE or the command-line python, I think that your console will display according to the default system encoding. I am using pydev with eclipse for some reason, so I had to go into my project settings, edit the launch configuration properties of my test script, go to the Common tab, and change the console from latin-1 to utf-8 so that I could visually confirm what I was doing was working. (4) If you want to have some test strings, eg test_str = "ó" in your source code, then you will have to tell python what kind of encoding you are using in that file. (FYI: when I mistyped an encoding I had to ctrl-Z because my file became unreadable.) This is easily accomplished by putting a line like so at the top of your source code file: # -*- coding: utf_8 -*- If you don't have this information, python attempts to parse your code as ascii by default, and so: SyntaxError: Non-ASCII character '\xf3' in file _redacted_ on line 81, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details Once your program is working correctly, or, if you aren't using python's console or any other console to look at output, then you will probably really only care about #1 on the list. System default and console encoding are not that important unless you need to look at output and/or you are using the builtin unicode() function (without any encoding parameters) instead of the string.decode() function. I wrote a demo function I will paste into the bottom of this gigantic mess that I hope correctly demonstrates the items in my list. Here is some of the output when I run the character 'ó' through the demo function, showing how various methods react to the character as input. My system encoding and console output are both set to utf_8 for this run: '?' = original char <type 'str'> repr(char)='\xf3' '?' = unicode(char) ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data 'ó' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Now I will change the system and console encoding to latin_1, and I get this output for the same input: 'ó' = original char <type 'str'> repr(char)='\xf3' 'ó' = unicode(char) <type 'unicode'> repr(unicode(char))=u'\xf3' 'ó' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Notice that the 'original' character displays correctly and the builtin unicode() function works now. Now I change my console output back to utf_8. '?' = original char <type 'str'> repr(char)='\xf3' '?' = unicode(char) <type 'unicode'> repr(unicode(char))=u'\xf3' '?' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Here everything still works the same as last time but the console can't display the output correctly. Etc. The function below also displays more information that this and hopefully would help someone figure out where the gap in their understanding is. I know all this information is in other places and more thoroughly dealt with there, but I hope that this would be a good kickoff point for someone trying to get coding with python and/or sqlite. Ideas are great but sometimes source code can save you a day or two of trying to figure out what functions do what. Disclaimers: I'm no encoding expert, I put this together to help my own understanding. I kept building on it when I should have probably started passing functions as arguments to avoid so much redundant code, so if I can I'll make it more concise. Also, utf_8 and latin_1 are by no means the only encoding schemes, they are just the two I was playing around with because I think they handle everything I need. Add your own encoding schemes to the demo function and test your own input. One more thing: there are apparently crazy application developers making life difficult in Windows. #!/usr/bin/env python # -*- coding: utf_8 -*- import os import sys def encodingDemo(str): validStrings = () try: print "str =",str,"{0} repr(str) = {1}".format(type(str), repr(str)) validStrings += ((str,""),) except UnicodeEncodeError as ude: print "Couldn't print the str itself because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print ude try: x = unicode(str) print "unicode(str) = ",x validStrings+= ((x, " decoded into unicode by the default system encoding"),) except UnicodeDecodeError as ude: print "ERROR. unicode(str) couldn't decode the string because the system encoding is set to an encoding that doesn't understand some character in the string." print "\tThe system encoding is set to {0}. See error:\n\t".format(sys.getdefaultencoding()), print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the unicode(str) because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print uee try: x = str.decode('latin_1') print "str.decode('latin_1') =",x validStrings+= ((x, " decoded with latin_1 into unicode"),) try: print "str.decode('latin_1').encode('utf_8') =",str.decode('latin_1').encode('utf_8') validStrings+= ((x, " decoded with latin_1 into unicode and encoded into utf_8"),) except UnicodeDecodeError as ude: print "The string was decoded into unicode using the latin_1 encoding, but couldn't be encoded into utf_8. See error:\n\t", print ude except UnicodeDecodeError as ude: print "Something didn't work, probably because the string wasn't latin_1 encoded. See error:\n\t", print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the str.decode('latin_1') because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print uee try: x = str.decode('utf_8') print "str.decode('utf_8') =",x validStrings+= ((x, " decoded with utf_8 into unicode"),) try: print "str.decode('utf_8').encode('latin_1') =",str.decode('utf_8').encode('latin_1') except UnicodeDecodeError as ude: print "str.decode('utf_8').encode('latin_1') didn't work. The string was decoded into unicode using the utf_8 encoding, but couldn't be encoded into latin_1. See error:\n\t", validStrings+= ((x, " decoded with utf_8 into unicode and encoded into latin_1"),) print ude except UnicodeDecodeError as ude: print "str.decode('utf_8') didn't work, probably because the string wasn't utf_8 encoded. See error:\n\t", print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the str.decode('utf_8') because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t",uee print print "Printing information about each character in the original string." for char in str: try: print "\t'" + char + "' = original char {0} repr(char)={1}".format(type(char), repr(char)) except UnicodeDecodeError as ude: print "\t'?' = original char {0} repr(char)={1} ERROR PRINTING: {2}".format(type(char), repr(char), ude) except UnicodeEncodeError as uee: print "\t'?' = original char {0} repr(char)={1} ERROR PRINTING: {2}".format(type(char), repr(char), uee) print uee try: x = unicode(char) print "\t'" + x + "' = unicode(char) {1} repr(unicode(char))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = unicode(char) ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = unicode(char) {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) try: x = char.decode('latin_1') print "\t'" + x + "' = char.decode('latin_1') {1} repr(char.decode('latin_1'))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = char.decode('latin_1') ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = char.decode('latin_1') {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) try: x = char.decode('utf_8') print "\t'" + x + "' = char.decode('utf_8') {1} repr(char.decode('utf_8'))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = char.decode('utf_8') ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = char.decode('utf_8') {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) print x = 'ó' encodingDemo(x) Much thanks for the answers below and especially to @John Machin for answering so thoroughly.

    Read the article

  • Validating parameters according to a fixed reference

    - by James P.
    The following method is for setting the transfer type of an FTP connection. Basically, I'd like to validate the character input (see comments). Is this going overboard? Is there a more elegant approach? How do you approach parameter validation in general? Any comments are welcome. public void setTransferType(Character typeCharacter, Character optionalSecondCharacter) throws NumberFormatException, IOException { // http://www.nsftools.com/tips/RawFTP.htm#TYPE // Syntax: TYPE type-character [second-type-character] // // Sets the type of file to be transferred. type-character can be any // of: // // * A - ASCII text // * E - EBCDIC text // * I - image (binary data) // * L - local format // // For A and E, the second-type-character specifies how the text should // be interpreted. It can be: // // * N - Non-print (not destined for printing). This is the default if // second-type-character is omitted. // * T - Telnet format control (<CR>, <FF>, etc.) // * C - ASA Carriage Control // // For L, the second-type-character specifies the number of bits per // byte on the local system, and may not be omitted. final Set<Character> acceptedTypeCharacters = new HashSet<Character>(Arrays.asList( new Character[] {'A','E','I','L'} )); final Set<Character> acceptedOptionalSecondCharacters = new HashSet<Character>(Arrays.asList( new Character[] {'N','T','C'} )); if( acceptedTypeCharacters.contains(typeCharacter) ) { if( new Character('A').equals( typeCharacter ) || new Character('E').equals( typeCharacter ) ){ if( acceptedOptionalSecondCharacters.contains(optionalSecondCharacter) ) { executeCommand("TYPE " + typeCharacter + " " + optionalSecondCharacter ); } } else { executeCommand("TYPE " + typeCharacter ); } } }

    Read the article

  • Unconvert Text File from Binary Format

    - by Hammer Bro.
    I've got a rather large CSV file (~700MB) which I know to consist of lines of 27-character alpha-numeric hashes; no commas or anything fancy. Somehow, during its migration from Windows to Linux (via winSCP and then a few regular SCPs), it has converted into some kind of binary format I am unfamiliar with. If I open the file in vi, everything appears fine, and it says [converted] at the bottom, although I know it's not a line endings issue (and dos2unix doesn't help). If I 'head' the file, it looks proper except for a "ÿþ" at the beginning of the first line. If I open up the file in nano, however, I see the "ÿþ" at the start and then "^@" before every character (even newlines and EoF). If I try to re-save or copy the file (say via: head file.csv short.txt), this special encoding is preserved. I copied the first ten lines out of vi (which displays it properly) into my Windows clipboard via my SSH client, then pasted it into a new text file, test.txt. This file is visually identical when opened in vi (and similar through 'head', minus the "ÿþ"), although it's roughly half of the filesize. Additionally, file test.txt test.txt: ASCII text file short.txt short.txt: I have no idea what format this once-text file got converted to (it's notoriously hard to search the internet for symbols), but surely there must be some way to convert it back. Any ideas?

    Read the article

  • Vim move cursor one character in insert mode without arrow keys

    - by bolov
    This might seem a little too overboard, but I switched to vim and I so happy about the workflow now. I try to discipline myself not to use the arrow keys, as keeping the hands on the alfa-keys all the time is such a big thing when writing. So when I need to navigate I get out of insert mode, move in normal mode and get back in insert mode. There is an exception where this is actually more disrupting: I use clang complete with snippets and super tab which is great. Except every time I get a function auto completed after I fill in the parameters I am left with the cursor before ) so to continue I have to move the cursor one character to the right. As you can imagine this happens very often. The only options I have (as far as I know) are : Escla or ?, and I am not happy about neither of them. The first one makes me hit 3 keys for just a simple 1 character cursor move, the second one makes me move my hand to the arrow keys. A third option would be to map CTRL-L or smth to ?. So what is the best way of doing this? //snippets (clang complete + supertab): foo($`param1`, $`param2`) //after completion: foo(var1, var2|) ^ ^ | | I am here | Need to be here | denotes cursor position

    Read the article

  • Why does text from Assembly.GetManifestResourceStream() start with three junk characters?

    - by flipdoubt
    I have a SQL file added to my VS.NET 2008 project as an embedded resource. Whenever I use the following code to read the file's content, the string returned always starts with three junk characters and then the text I expect. I assume this has something to do with the Encoding.Default I am using, but that is just a guess. Why does this text keep showing up? Should I just trim off the first three characters or is there a more informed approach? public string GetUpdateRestoreSchemaScript() { var type = GetType(); var a = Assembly.GetAssembly(type); var script = "UpdateRestoreSchema.sql"; var resourceName = String.Concat(type.Namespace, ".", script); using(Stream stream = a.GetManifestResourceStream(resourceName)) { byte[] buffer = new byte[stream.Length]; stream.Read(buffer, 0, buffer.Length); // UPDATE: Should be Encoding.UTF8 return Encoding.Default.GetString(buffer); } } Update: I now know that my code works as expected if I simply change the last line to return a UTF-8 encoded string. It will always be true for this embedded file, but will it always be true? Is there a way to test any buffer to determine its encoding?

    Read the article

  • mysql match against russain

    - by Devenv
    Hey, Trying to solve this for a very long time now... SELECT MATCH(name) AGAINST('????????') (russian) doesn't work, but SELECT MATCH(name) AGAINST('abraxas') (english) work perfectly. I know it's something with character-set, but I tried all kind of settings and it didn't work. For now it's latin-1. LIKE works This is the show variables charset related: character_set_client - latin1 character_set_connection - latin1 character_set_database - latin1 character_set_filesystem - binary character_set_results - latin1 character_set_server - latin1 character_set_system - utf8 character_sets_dir - /usr/share/mysql/charsets/ collation_connection - latin1_swedish_ci collation_database - latin1_swedish_ci collation_server - latin1_swedish_ci chunk of /etc/my.cnf default-character-set=latin1 skip-character-set-client-handshake chunk of the dump: /*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */; /*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */; /*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */; /*!40101 SET NAMES utf8 */; DROP TABLE IF EXISTS `scenes_raw`; /*!40101 SET @saved_cs_client = @@character_set_client */; /*!40101 SET character_set_client = utf8 */; CREATE TABLE `scenes_raw` ( `scene_name` varchar(40) DEFAULT NULL, ...blabla... ) ENGINE=MyISAM AUTO_INCREMENT=901 DEFAULT CHARSET=utf8; (I did tests without skip-character-set-client-handshake too) SHOW TABLE STATUS WHERE Name = 'scenes_raw'\G Name: scenes_raw Engine: MyISAM Version: 10 Row_format: Dynamic Index_length: 23552 Collation: utf8_general_ci Checksum: NULL Create_options:

    Read the article

  • Pinyin Character entry on a touchscreen keyboard

    - by mmr
    The app I'm developing requires that it be deployed in China, which means that it needs to have Pinyin and Chinese character handling. I'm told that the way that our customers handle character entry is like so: Enter in the pinyin character, like 'zhang' As they enter the characters, a list of possible Chinese (Mandarin?) characters are presented to the user, like: The user will then select '1' to enter the family name that is roughly translated to 'zhang' How can I hook such programs (I believe one is called 'mspy.exe', from Microsoft, which I'm lead to believe comes with Microsoft versions of XP) into a WPF text box? Right now, the user can enter text either by using their keyboard or by using an on-screen keyboard, so I will probably need to capture the event of a keypress from either source and feed it to some OS event or to MSPY.exe or some similar program. Or is there some other way to enter pinyin and have it converted to Mandarin? Is there a program other than MSPY I should look at? EDIT: For those of you who think that this should 'just work', it does not. Chinese character entry will work just fine if entering text into notepad or the start-run menu or whatever, but it will not work in WPF. That's the key to this question: how do I enable WPF entry? There's the Google Pinyin and Sogou pinyin, but the websites are in Mandarin or Chinese or something similar and I don't read the language.

    Read the article

  • How is a relative JMP (x86) implemented in an Assembler?

    - by Pindatjuh
    While building my assembler for the x86 platform I encountered some problems with encoding the JMP instruction: enc inst size in bytes EB cb JMP rel8 2 E9 cw JMP rel16 4 (because of 0x66 16-bit prefix) E9 cd JMP rel32 5 ... (from my favourite x86 instruction website, http://siyobik.info/index.php?module=x86&id=147) All are relative jumps, where the size of each encoding (operation + operand) is in the third column. Now my original (and thus fault because of this) design reserved the maximum (5 bytes) space for each instruction. The operand is not yet known, because it's a jump to a yet unknown location. So I've implemented a "rewrite" mechanism, that rewrites the operands in the correct location in memory, if the location of the jump is known, and fills the rest with NOPs. This is a somewhat serious concern in tight-loops. Now my problem is with the following situation: b: XXX c: JMP a e: XXX ... XXX d: JMP b a: XXX (where XXX is any instruction, depending on the to-be assembled program) The problem is that I want the smallest possible encoding for a JMP instruction (and no NOP filling). I have to know the size of the instruction at c before I can calculate the relative distance between a and b for the operand at d. The same applies for the JMP at c: it needs to know the size of d before it can calculate the relative distance between e and a. How do existing assemblers implement this, or how would you implement this? This is what I am thinking which solves the problem: First encode all the instructions to opcodes between the JMP and it's target, and if this region contains a variable-sized opcode, use the maximum size, i.e. 5 for JMP. Then in some conditions, the JMP is oversized (because it may fit in a smaller encoding): so another pass will search for oversized JMPs, shrink them, and move all instructions ahead), and set absolute branching instructions (i.e. external CALLs) after this pass is completed. I wonder, perhaps this is an over-engineered solution, that's why I ask this question.

    Read the article

  • How to workaround Python "WindowsError messages are not properly encoded" problem?

    - by Victor Lin
    It's a trouble when Python raised a WindowsError, the encoding of message of the exception is always os-native-encoded. For example: import os os.remove('does_not_exist.file') Well, here we get an exception: Traceback (most recent call last): File "<stdin>", line 1, in <module> WindowsError: [Error 2] ???????????: 'does_not_exist.file' As the language of my Windows7 is Traditional Chinese, the default error message I get is in big5 encoding (as know as CP950). >>> try: ... os.remove('abc.file') ... except WindowsError, value: ... print value.args ... (2, '\xa8t\xb2\xce\xa7\xe4\xa4\xa3\xa8\xec\xab\xfc\xa9w\xaa\xba\xc0\xc9\xae\xd7\xa1C') >>> As you see here, error message is not Unicode, then I will get another encoding exception when I try to print it out. Here is the issue, it can be found in Python issue list: http://bugs.python.org/issue1754 The question is, how to workaround this? How to get the native encoding of WindowsError? The version of Python I use is 2.6. Thanks.

    Read the article

< Previous Page | 23 24 25 26 27 28 29 30 31 32 33 34  | Next Page >