Search Results

Search found 5756 results on 231 pages for 'illegal characters'.

Page 109/231 | < Previous Page | 105 106 107 108 109 110 111 112 113 114 115 116  | Next Page >

  • cygwin åäö in emacs

    - by starcorn
    Hey I been bugging with this problem for some hours now. And I couldn't find the answer on google so I try it here. The problem is that when I run emacs in cygwin in -nw mode characters like åäö doesn't come out normally. However it is perfectly normal when I type those character in mintty terminal. The answer that I found on google is that I should type M-x standard-display-european however emacs doesn't have that option. It only found standard-display-cyrillic-translit

    Read the article

  • [Notepad++] delete rest of line after specific string

    - by nixdagibts
    I'm looking for a way to search for a specific string e.g. '=UUID:' and delete it and all following characters per line. I would prefer a way/macro/addon for notepad++. But all other tools or scripts are welcome :) Before ://81.88.22.6/=UUID:63969B2469B7A94EBBDBD7CB5B9C00BA ://-ad.cgi*=UUID:3C8EFF48B674CC42BF5B6E2B7BA820E7 ://-ads/*=UUID:0D6CF7D5BE3F034C8A136CC99A074406 Note that the numbers are always different per line so you couldn't do a search 'n replace with them. Should look like this after ://81.88.22.6/ ://-ad.cgi* ://-ads/*

    Read the article

  • In utf-8 collation, why 11- is less then 1- ?

    - by ???
    I found that the sort result in ASCII: 1- 11- and in UTF-8: 11- 1- I feel it's so counter-intuitive, and it's not dictionary order. Isn't the character '-' (002d) is always less then [0-9] (0030-0039)? What's the general rule in UTF-8 collation? And how to bypass it, just make - be less then [0-9] while keep other characters unchanged for UTF-8, in Linux? (So it can affects the result of ls --sort, sort, etc. )

    Read the article

  • Is it possible to control the whole Gnome desktop with vi-like commands?

    - by roy
    I know with readline you can use emacs or vi commands to edit the input of several interactive text programs. I wonder if there exists such a similar facility to control the whole desktop in Gnome or in any other desktop environment. Maybe it could be a program that intercepts keystrokes and process them in the way vi does, translating sequences of characters to desktop commands and delivering to the active window only the text that is writen in insert mode. Do you know some tool like that?

    Read the article

  • Copy + Paste Chinese From Website

    - by icu222much
    I am re-branding a website that has both English & Chinese Traditional text. When I am copying + pasting the Chinese text into notepad++, the characters gets displayed as question marks. I tried changing the language settings within notepad++ to Chinese, but it now displays as squashed rectangles. I also changed my keyboard language setting in Windows 7 to Chinese but it did not work. This is what I see when I right-click in Chrome to copy the Chinese character:

    Read the article

  • How to change background color of text in google document

    - by Mirage
    I exactly don't know how i sometime did it but it happens sometime and sometime not Suppose i have a paragrah and i want add the background color to that paragraph. Google docs adds the background color 2 way 1)One way is that it selects the text only and adds the background color to text characters only 2)The other way is it makes the whole paragraph a block and adds the background color to that block. i.e the background color fill the whole box like div box with background color. I don't know to try the second way

    Read the article

  • Can't export emacs display on ssh

    - by Humble Debugger
    local_machine:$> ssh myself@external_machine_ip_address -p specific_port -X external_machine:$> echo $DISPLAY localhost:10.0 external_machine:$> emacs Warning: Cannot convert string "-*-courier-medium-r-*-*-*-120-*-*-*-*-iso8859-*" to type FontStruct Warning: Cannot convert string "-*-helvetica-medium-r-*--*-120-*-*-*-*-iso8859-1" to type FontStruct I do see the emacs window, but I can't see any of the characters. What could be the error ?

    Read the article

  • Strange Lose Of Format On Pen Drive

    - by Kiewic
    Hi, here is an screenshot of my pen drive. The files are impossible to open, and the names has been replaced by strange characters. In Ubuntu is worst, the windows system crash. What can I do to recover my information?

    Read the article

  • Amazon S3 Iterating Through Multi-Page Results. (withMarker)

    - by Jitu
    Trying to iterate through AmazonS3 that has around 5000+ keys stored in the bucket, used sample code based on provided link on Amazon Developer Guide http://docs.amazonwebservices.com/AmazonS3/latest/dev/ListingObjectKeysUsingNetSDK.html Issue is iteration fails when NexMarker is passed which has length of more than 128 string characters, which seems unusal as withMarker accepts string as parameter and there is no documentation on limit to withMarker. request.Marker = response.NextMarker; Has anyone faced similar issue. Thanks in advance.

    Read the article

  • How do I change a character embedded in filenames to another character?

    - by PilotMom
    in a directory with multiple sub directories I need to change filenames that have the "_" character to a another character "." example Current filename: ABC12345_DEF Change to filename: ABC12345.DEF I need to do this recursively through a directory tree. the last three characters of the filename are not always the same. using rename wildcards on either side of the "_" or "." doesn't work (plus I need to do this through several directories)

    Read the article

  • function to org-sort by three (3) criteria: due date / priority / title

    - by lawlist
    Is anyone aware of an org-sort function / modification that can refile / organize a group of TODO so that it sorts them by three (3) criteria: first sort by due date, second sort by priority, and third sort by by title of the task? EDIT: If anyone can please help me to modify this so that undated TODO are sorted last, that would be greatly appreciated -- at the present time, undated TODO are not being sorted: ;; multiple sort (defun org-sort-multi (&rest sort-types) "Multiple sorts on a certain level of an outline tree, or plain list items. SORT-TYPES is a list where each entry is either a character or a cons pair (BOOL . CHAR), where BOOL is whether or not to sort case-sensitively, and CHAR is one of the characters defined in `org-sort-entries-or-items'. Entries are applied in back to front order. Example: To sort first by TODO status, then by priority, then by date, then alphabetically (case-sensitive) use the following call: (org-sort-multi '(?d ?p ?t (t . ?a)))" (interactive) (dolist (x (nreverse sort-types)) (when (char-valid-p x) (setq x (cons nil x))) (condition-case nil (org-sort-entries (car x) (cdr x)) (error nil)))) ;; sort current level (defun lawlist-sort (&rest sort-types) "Sort the current org level. SORT-TYPES is a list where each entry is either a character or a cons pair (BOOL . CHAR), where BOOL is whether or not to sort case-sensitively, and CHAR is one of the characters defined in `org-sort-entries-or-items'. Entries are applied in back to front order. Defaults to \"?o ?p\" which is sorted by TODO status, then by priority" (interactive) (when (equal mode-name "Org") (let ((sort-types (or sort-types (if (or (org-entry-get nil "TODO") (org-entry-get nil "PRIORITY")) '(?d ?t ?p) ;; date, time, priority '((nil . ?a)))))) (save-excursion (outline-up-heading 1) (let ((start (point)) end) (while (and (not (bobp)) (not (eobp)) (<= (point) start)) (condition-case nil (outline-forward-same-level 1) (error (outline-up-heading 1)))) (unless (> (point) start) (goto-char (point-max))) (setq end (point)) (goto-char start) (apply 'org-sort-multi sort-types) (goto-char end) (when (eobp) (forward-line -1)) (when (looking-at "^\\s-*$") ;; (delete-line) ) (goto-char start) ;; (dotimes (x ) (org-cycle)) )))))

    Read the article

  • Strange loss of format on pen drive

    - by Kiewic
    Hi, here is an screenshot of my pen drive. The files are impossible to open, and the names have been replaced by strange characters. In Ubuntu is worst, the Windows system crash. What can I do to recover my information?

    Read the article

  • Path Length in Windows

    - by Dexter
    Is there any reason paths are still limited to ~250 characters in Windows? I'm not asking about a solution here (since there isn't one, other than \\?\ perhaps), but about why this is still an issue in 2012. Microsoft itself has failed to provide an explanation, so I'm hoping that maybe someone here, who has more insight into this than me, can provide an answer. Also, if \\?\ is supposed to be the "cure" to this, why aren't paths implicitly converted to the \\?\ notation by Microsoft's own programs?

    Read the article

  • Copy/Paste from Word Document to Web

    - by Eric
    Trying to save time and I need an easy way to make sure copied text from word is UTF-8 compatible for the web. Generally I have to copy and paste 4 or 5 pages of text at a time. Going through it and correcting characters individually is a real time waster. Anyone have any ideas? Is there a setting in Microsoft Word I might be missing?

    Read the article

  • How to change the default double click text separators? [closed]

    - by tobeannounced
    Possible Duplicates: Mac OS X: How to change word separator characters? Wrapping word configuration in mac I would like : to cause a different selection when text is double clicked. Eg if in Chrome (beta v9, OSX Snow Leopard) I double click on the text define:superuser, and I click on the superuser part of that line, it should only select superuser, not the whole thing. I wonder if this can be done!

    Read the article

  • Parsing a UTF-16 encoded xml file in ruby with REXML

    - by Matthew Toohey
    Hello, I'm trying to parse the following UTF-16 encoded xml file in REXML: http://www.abc.net.au/triplej/feeds/playout/triplejsydneyplayout.xml?_523525 REXML encounters an error after the following: >> require 'rexml/document' => true >> include REXML => Object >> require 'net/http' => true >> triplejString = Net::HTTP.get('www.abc.net.au', '/triplej/feeds/playout/triplejsydneyplayout.xml?_523525') => "\377\376<\000?\000x\000m\000l\000 \000v\000e\000r\000s\000i\000o\000n\000=\000\"\0001\000.\0000\000\"\000 \000e\000n\000c\000o\000d\000i\000n\000g\000=\000\"\000u\000t\000f\000-\0001\0006\000\"\000?\000>\000<\000a\000b\000c\000m\000u\000s\000i\000c\000_\000p\000l\000a\000y\000o\000u\000t\000>\000<\000c\000h\000a\000n\000n\000e\000l\000>\000J\000J\000J\000<\000/\000c\000h\000a\000n\000n\000e\000l\000>\000<\000p\000u\000b\000l\000i\000s\000h\000t\000i\000m\000e\000>\000F\000r\000i\000,\000 \0003\0000\000 \000A\000p\000r\000 \0002\0000\0001\0000\000 \0001\0001\000:\0005\0007\000:\0001\0007\000 \000G\000M\000T\000<\000/\000p\000u\000b\000l\000i\000s\000h\000t\000i\000m\000e\000>\000<\000i\000t\000e\000m\000s\000>\000<\000i\000t\000e\000m\000>\000<\000p\000l\000a\000y\000i\000n\000g\000>\000n\000o\000w\000<\000/\000p\000l\000a\000y\000i\000n\000g\000>\000<\000t\000i\000t\000l\000e\000>\000D\000o\000c\000t\000o\000r\000,\000 \000D\000o\000c\000t\000o\000r\000<\000/\000t\000i\000t\000l\000e\000>\000<\000t\000r\000a\000c\000k\000i\000d\000>\000<\000/\000t\000r\000a\000c\000k\000i\000d\000>\000<\000p\000l\000a\000y\000e\000d\000t\000i\000m\000e\000>\000F\000r\000i\000,\000 \0003\0000\000 \000A\000p\000r\000 \0002\0000\0001\0000\000 \0001\0001\000:\0005\0007\000:\0001\0007\000 \000G\000M\000T\000<\000/\000p\000l\000a\000y\000e\000d\000t\000i\000m\000e\000>\000<\000p\000u\000b\000l\000i\000s\000h\000e\000r\000>\000<\000/\000p\000u\000b\000l\000i\000s\000h\000e\000r\000>\000<\000d\000a\000t\000e\000c\000o\000p\000y\000r\000i\000g\000h\000t\000e\000d\000>\0002\0000\0000\0003\000<\000/\000d\000a\000t\000e\000c\000o\000p\000y\000r\000i\000g\000h\000t\000e\000d\000>\000<\000d\000u\000r\000a\000t\000i\000o\000n\000>\0001\0006\0003\000<\000/\000d\000u\000r\000a\000t\000i\000o\000n\000>\000<\000a\000u\000s\000t\000>\000N\000o\000<\000/\000a\000u\000s\000t\000>\000<\000t\000r\000a\000c\000k\000n\000o\000t\000e\000>\000<\000/\000t\000r\000a\000c\000k\000n\000o\000t\000e\000>\000<\000t\000r\000a\000c\000k\000l\000i\000n\000k\000>\000<\000/\000t\000r\000a\000c\000k\000l\000i\000n\000k\000>\000<\000s\000h\000o\000w\000>\000<\000/\000s\000h\000o\000w\000>\000<\000t\000a\000l\000e\000n\000t\000>\000<\000/\000t\000a\000l\000e\000n\000t\000>\000<\000a\000l\000b\000u\000m\000>\000<\000a\000l\000b\000u\000m\000n\000a\000m\000e\000>\000D\000r\000i\000v\000i\000n\000g\000 \000F\000o\000r\000 \000T\000h\000e\000 \000S\000t\000o\000r\000m\000/\000D\000o\000c\000t\000o\000r\000 \000D\000o\000c\000t\000o\000r\000<\000/\000a\000l\000b\000u\000m\000n\000a\000m\000e\000>\000<\000a\000l\000b\000u\000m\000i\000d\000>\0008\0003\000-\0004\0002\0002\0006\0009\000<\000/\000a\000l\000b\000u\000m\000i\000d\000>\000<\000a\000l\000b\000u\000m\000i\000m\000a\000g\000e\000>\000h\000t\000t\000p\000:\000/\000/\000w\000w\000w\000.\000a\000b\000c\000.\000n\000e\000t\000.\000a\000u\000/\000t\000r\000i\000p\000l\000e\000j\000/\000c\000o\000v\000e\000r\000s\000/\000G\000y\000r\000o\000s\000c\000o\000p\000e\000 \000-\000 \000D\000r\000i\000v\000i\000n\000g\000 \000F\000o\000r\000 \000T\000h\000e\000 \000S\000t\000o\000r\000m\000/\000D\000o\000c\000t\000o\000r\000 \000D\000o\000c\000t\000o\000r\000 \000(\0002\0000\0000\0003\000)\000.\000j\000p\000g\000<\000/\000a\000l\000b\000u\000m\000i\000m\000a\000g\000e\000>\000<\000/\000a\000l\000b\000u\000m\000>\000<\000a\000r\000t\000i\000s\000t\000>\000<\000a\000r\000t\000i\000s\000t\000n\000a\000m\000e\000>\000G\000y\000r\000o\000s\000c\000o\000p\000e\000<\000/\000a\000r\000t\000i\000s\000t\000n\000a\000m\000e\000>\000<\000a\000r\000t\000i\000s\000t\000i\000d\000>\000<\000/\000a\000r\000t\000i\000s\000t\000i\000d\000>\000<\000a\000r\000t\000i\000s\000t\000n\000o\000t\000e\000>\000<\000/\000a\000r\000t\000i\000s\000t\000n\000o\000t\000e\000>\000<\000a\000r\000t\000i\000s\000t\000l\000i\000n\000k\000>\000<\000/\000a\000r\000t\000i\000s\000t\000l\000i\000n\000k\000>\000<\000/\000a\000r\000t\000i\000s\000t\000>\000<\000/\000i\000t\000e\000m\000>\000<\000/\000i\000t\000e\000m\000s\000>\000<\000/\000a\000b\000c\000m\000u\000s\000i\000c\000_\000p\000l\000a\000y\000o\000u\000t\000>\000" >> xmlDoc = REXML::Document.new(triplejString) REXML::ParseException: #<REXML::ParseException: malformed XML: missing tag start Line: Position: Last 80 unconsumed characters: <?xml version="1.0" encoding="utf-16"?><a> /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/rexml/parsers/baseparser.rb:356:in `pull' /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/rexml/parsers/treeparser.rb:22:in `parse' /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/rexml/document.rb:227:in `build' /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/rexml/document.rb:43:in `initialize' (irb):19:in `new' (irb):19:in `irb_binding' /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/irb/workspace.rb:52:in `irb_binding' /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/irb/workspace.rb:52 ... malformed XML: missing tag start Line: Position: Last 80 unconsumed characters: <?xml version="1.0" encoding="utf-16"?><a Line: Position: Last 80 unconsumed characters: <?xml version="1.0" encoding="utf-16"?><a from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/rexml/parsers/treeparser.rb:92:in `parse' from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/rexml/document.rb:227:in `build' from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/rexml/document.rb:43:in `initialize' from (irb):19:in `new' from (irb):19 Any ideas?

    Read the article

  • JSF composite component - weird behavior when trying to save state

    - by jc12
    I'm using Glassfish 3.2.2 and JSF 2.1.11 I'm trying to create a composite component that will take as parameters a string and a max number of characters and then will show only the max amount of characters, but it will have a "more" link next to it, that when clicked will expand the text to the full length and will then have a "less" link next to it to take it back to the max number of characters. I'm seeing some weird behavior, so I'm wondering if I'm doing something wrong. Here is my composite component definition: <html xmlns="http://www.w3.org/1999/xhtml" xmlns:composite="http://java.sun.com/jsf/composite" xmlns:h="http://java.sun.com/jsf/html" xmlns:fn="http://java.sun.com/jsp/jstl/functions" xmlns:p="http://primefaces.org/ui"> <composite:interface componentType="expandableTextComponent"> <composite:attribute name="name" required="true"/> <composite:attribute name="maxCharacters" required="true"/> <composite:attribute name="value" required="true"/> </composite:interface> <composite:implementation> <h:panelGroup id="#{cc.attrs.name}"> <h:outputText value="#{fn:substring(cc.attrs.value, 0, cc.attrs.maxCharacters)}" rendered="#{fn:length(cc.attrs.value) le cc.attrs.maxCharacters}"/> <h:outputText value="#{fn:substring(cc.attrs.value, 0, cc.attrs.maxCharacters)}" rendered="#{fn:length(cc.attrs.value) gt cc.attrs.maxCharacters and !cc.expanded}" style="margin-right: 5px;"/> <h:outputText value="#{cc.attrs.value}" rendered="#{fn:length(cc.attrs.value) gt cc.attrs.maxCharacters and cc.expanded}" style="margin-right: 5px;"/> <p:commandLink actionListener="#{cc.toggleExpanded()}" rendered="#{fn:length(cc.attrs.value) gt cc.attrs.maxCharacters}" update="#{cc.attrs.name}"> <h:outputText value="#{__commonButton.more}..." rendered="#{!cc.expanded}"/> <h:outputText value="#{__commonButton.less}" rendered="#{cc.expanded}"/> </p:commandLink> </h:panelGroup> </composite:implementation> </html> And here is the Java component: @FacesComponent("expandableTextComponent") public class ExpandableTextComponent extends UINamingContainer { boolean expanded; public boolean isExpanded() { return expanded; } public void toggleExpanded() { expanded = !expanded; } } Unfortunately expanded is always false every time the toggleExpanded function is called. However if I change the composite component to the following then it works. <html xmlns="http://www.w3.org/1999/xhtml" xmlns:composite="http://java.sun.com/jsf/composite" xmlns:h="http://java.sun.com/jsf/html" xmlns:fn="http://java.sun.com/jsp/jstl/functions" xmlns:p="http://primefaces.org/ui"> <composite:interface componentType="expandableTextComponent"> <composite:attribute name="name" required="true"/> <composite:attribute name="maxCharacters" required="true"/> <composite:attribute name="value" required="true"/> </composite:interface> <composite:implementation> <h:panelGroup id="#{cc.attrs.name}"> <h:outputText value="#{fn:substring(cc.attrs.value, 0, cc.attrs.maxCharacters)}" rendered="#{fn:length(cc.attrs.value) le cc.attrs.maxCharacters}"/> <h:outputText value="#{fn:substring(cc.attrs.value, 0, cc.attrs.maxCharacters)}" rendered="#{fn:length(cc.attrs.value) gt cc.attrs.maxCharacters and !cc.expanded}" style="margin-right: 5px;"/> <p:commandLink actionListener="#{cc.toggleExpanded()}" rendered="#{fn:length(cc.attrs.value) gt cc.attrs.maxCharacters and !cc.expanded}" update="#{cc.attrs.name}" process="@this"> <h:outputText value="#{__commonButton.more}..."/> </p:commandLink> <h:outputText value="#{cc.attrs.value}" rendered="#{fn:length(cc.attrs.value) gt cc.attrs.maxCharacters and cc.expanded}" style="margin-right: 5px;"/> <p:commandLink actionListener="#{cc.toggleExpanded()}" rendered="#{fn:length(cc.attrs.value) gt cc.attrs.maxCharacters and cc.expanded}" update="#{cc.attrs.name}" process="@this"> <h:outputText value="#{__commonButton.less}"/> </p:commandLink> </h:panelGroup> </composite:implementation> </html> If I place a breakpoint in the toggleExpanded function, it only gets called on the "more" link and not the "less" link. So the question is why doesn't it get called when I click on the "less" link? Shouldn't this code be equivalent to the code above? Is there a better way to save state in a component?

    Read the article

  • Using C# to detect whether a filename character is considered international

    - by Morten Mertner
    I've written a small console application (source below) to locate and optionally rename files containing international characters, as they are a source of constant pain with most source control systems (some background on this below). The code I'm using has a simple dictionary with characters to look for and replace (and nukes every other character that uses more than one byte of storage), but it feels very hackish. What's the right way to (a) find out whether a character is international? and (b) what the best ASCII substitution character would be? Let me provide some background information on why this is needed. It so happens that the danish Å character has two different encodings in UTF-8, both representing the same symbol. These are known as NFC and NFD encodings. Windows and Linux will create NFC encoding by default but respect whatever encoding it is given. Mac will convert all names (when saving to a HFS+ partition) to NFD and therefore returns a different byte stream for the name of a file created on Windows. This effectively breaks Subversion, Git and lots of other utilities that don't care to properly handle this scenario. I'm currently evaluating Mercurial, which turns out to be even worse at handling international characters.. being fairly tired of these problems, either source control or the international character would have to go, and so here we are. My current implementation: public class Checker { private Dictionary<char, string> internationals = new Dictionary<char, string>(); private List<char> keep = new List<char>(); private List<char> seen = new List<char>(); public Checker() { internationals.Add( 'æ', "ae" ); internationals.Add( 'ø', "oe" ); internationals.Add( 'å', "aa" ); internationals.Add( 'Æ', "Ae" ); internationals.Add( 'Ø', "Oe" ); internationals.Add( 'Å', "Aa" ); internationals.Add( 'ö', "o" ); internationals.Add( 'ü', "u" ); internationals.Add( 'ä', "a" ); internationals.Add( 'é', "e" ); internationals.Add( 'è', "e" ); internationals.Add( 'ê', "e" ); internationals.Add( '¦', "" ); internationals.Add( 'Ã', "" ); internationals.Add( '©', "" ); internationals.Add( ' ', "" ); internationals.Add( '§', "" ); internationals.Add( '¡', "" ); internationals.Add( '³', "" ); internationals.Add( '­', "" ); internationals.Add( 'º', "" ); internationals.Add( '«', "-" ); internationals.Add( '»', "-" ); internationals.Add( '´', "'" ); internationals.Add( '`', "'" ); internationals.Add( '"', "'" ); internationals.Add( Encoding.UTF8.GetString( new byte[] { 226, 128, 147 } )[ 0 ], "-" ); internationals.Add( Encoding.UTF8.GetString( new byte[] { 226, 128, 148 } )[ 0 ], "-" ); internationals.Add( Encoding.UTF8.GetString( new byte[] { 226, 128, 153 } )[ 0 ], "'" ); internationals.Add( Encoding.UTF8.GetString( new byte[] { 226, 128, 166 } )[ 0 ], "." ); keep.Add( '-' ); keep.Add( '=' ); keep.Add( '\'' ); keep.Add( '.' ); } public bool IsInternationalCharacter( char c ) { var s = c.ToString(); byte[] bytes = Encoding.UTF8.GetBytes( s ); if( bytes.Length > 1 && ! internationals.ContainsKey( c ) && ! seen.Contains( c ) ) { Console.WriteLine( "X '{0}' ({1})", c, string.Join( ",", bytes ) ); seen.Add( c ); if( ! keep.Contains( c ) ) { internationals[ c ] = ""; } } return internationals.ContainsKey( c ); } public bool HasInternationalCharactersInName( string name, out string safeName ) { StringBuilder sb = new StringBuilder(); Array.ForEach( name.ToCharArray(), c => sb.Append( IsInternationalCharacter( c ) ? internationals[ c ] : c.ToString() ) ); int length = sb.Length; sb.Replace( " ", " " ); while( sb.Length != length ) { sb.Replace( " ", " " ); } safeName = sb.ToString().Trim(); string namePart = Path.GetFileNameWithoutExtension( safeName ); if( namePart.EndsWith( "." ) ) safeName = namePart.Substring( 0, namePart.Length - 1 ) + Path.GetExtension( safeName ); return name != safeName; } } And this would be invoked like this: FileInfo file = new File( "Århus.txt" ); string safeName; if( checker.HasInternationalCharactersInName( file.Name, out safeName ) ) { // rename file }

    Read the article

  • Writing an optimised and efficient search engine with mySQL and ColdFusion

    - by Mel
    I have a search page with the following scenarios listed below. I was told there was a better way to do it, but not how, and that I am using too many if statements, and that it's prone to causing an error through url manipulation: Search.cfm will processes a search made from a search bar present on all pages, with one search input (titleName). If search.cfm is accessed manually (through URL not through using the simple search bar on all pages) it displays an advanced search form with three inputs (titleName, genreID, platformID) or it evaluates searchResponse variable and decides what to do. If simple search query is blank, has no results, or less than 3 characters it displays an error If advanced search query is blank, has no results, or less than 3 characters it displays an error If any successful search returns results, they come back normally. The top-of-page logic is as follows: <!---SET DEFAULT VARIABLE---> <cfparam name="variables.searchResponse" default=""> <!---CHECK TO SEE IF SIMPLE SEARCH A FORM WAS SUBMITTED AND EXECUTE SEARCH IF IT WAS---> <cfif IsDefined("Form.simpleSearch") AND Len(Trim(Form.titleName)) LTE 2> <cfset variables.searchResponse = "invalidString"> <cfelseif IsDefined("Form.simpleSearch") AND Len(Trim(Form.titleName)) GTE 3> <!---EXECUTE METHOD AND GET DATA---> <cfinvoke component="myComponent" method="simpleSearch" searchString="#Form.titleName#" returnvariable="simpleSearchResult"> <cfset variables.searchResponse = "simpleSearchResult"> </cfif> <!---CHECK IF ANY RECORDS WERE FOUND---> <cfif IsDefined("variables.simpleSearchResult") AND simpleSearchResult.RecordCount IS 0> <cfset variables.searchResponse = "noResult"> </cfif> <!---CHECK IF ADVANCED SEARCH FORM WAS SUBMITTED---> <cfif IsDefined("Form.AdvancedSearch") AND Len(Trim(Form.titleName)) LTE 2> <cfset variables.searchResponse = "invalidString"> <cfelseif IsDefined("Form.advancedSearch") AND Len(Trim(Form.titleName)) GTE 2> <!---EXECUTE METHOD AND GET DATA---> <cfinvoke component="myComponent" method="advancedSearch" returnvariable="advancedSearchResult" titleName="#Form.titleName#" genreID="#Form.genreID#" platformID="#Form.platformID#"> <cfset variables.searchResponse = "advancedSearchResult"> </cfif> <!---CHECK IF ANY RECORDS WERE FOUND---> <cfif IsDefined("variables.advancedSearchResult") AND advancedSearchResult.RecordCount IS 0> <cfset variables.searchResponse = "noResult"> </cfif> I'm using the searchResponse variable to decide what the the page displays, based on the following scenarios: <!---ALWAYS DISPLAY SIMPLE SEARCH BAR AS IT'S PART OF THE HEADER---> <form name="simpleSearch" action="search.cfm" method="post"> <input type="hidden" name="simpleSearch" /> <input type="text" name="titleName" /> <input type="button" value="Search" onclick="form.submit()" /> </form> <!---IF NO SEARCH WAS SUBMITTED DISPLAY DEFAULT FORM---> <cfif searchResponse IS ""> <h1>Advanced Search</h1> <!---DISPLAY FORM---> <form name="advancedSearch" action="search.cfm" method="post"> <input type="hidden" name="advancedSearch" /> <input type="text" name="titleName" /> <input type="text" name="genreID" /> <input type="text" name="platformID" /> <input type="button" value="Search" onclick="form.submit()" /> </form> </cfif> <!---IF SEARCH IS BLANK OR LESS THAN 3 CHARACTERS DISPLAY ERROR MESSAGE---> <cfif searchResponse IS "invalidString"> <cfoutput> <h1>INVALID SEARCH</h1> </cfoutput> </cfif> <!---IF SEARCH WAS MADE BUT NO RESULTS WERE FOUND---> <cfif searchResponse IS "noResult"> <cfoutput> <h1>NO RESULT FOUND</h1> </cfoutput> </cfif> <!---IF SIMPLE SEARCH WAS MADE A RESULT WAS FOUND---> <cfif searchResponse IS "simpleSearchResult"> <cfoutput> <h1>Search Results</h1> </cfoutput> <cfoutput query="simpleSearchResult"> <!---DISPLAY QUERY DATA---> </cfoutput> </cfif> <!---IF ADVANCED SEARCH WAS MADE A RESULT WAS FOUND---> <cfif searchResponse IS "advancedSearchResult"> <cfoutput> <h1>Search Results</h1> <p>Your search for "#Form.titleName#" returned #advancedSearchResult.RecordCount# result(s).</p> </cfoutput> <cfoutput query="advancedSearchResult"> <!---DISPLAY QUERY DATA---> </cfoutput> </cfif> Is my logic a) not efficient because my if statements/is there a better way to do this? And b) Can you see any scenarios where my code can break? I've tested it but I have not been able to find any issues with it. And I have no way of measuring performance. Any thoughts and ideas would be greatly appreciated. Many thanks

    Read the article

  • Upgraded to Ubuntu 13.10 - Apache not able to start

    - by 0R10N
    I updated to Ubuntu 13.10 (from Ubuntu 13.04) last weekend, and now Apache is not being able to start. It was working perfectly well until the upgraded, and I haven't changed anything myself. When I ran a restart this is what I get apache2: Syntax error on line 260 of /etc/apache2/apache2.conf: Could not open configuration file /etc/apache2/conf.d/: No such file or directory So, I created the directory, and then I get this: * Starting web server apache2 * * The apache2 configtest failed. Output of config test was: [Wed Oct 30 11:17:42.921934 2013] [proxy_html:notice] [pid 2496] AH01425: I18n support in mod_proxy_html requires mod_xml2enc. Without it, non-ASCII characters in proxied pages are likely to display incorrectly. AH00526: Syntax error on line 84 of /etc/apache2/apache2.conf: Invalid command 'LockFile', perhaps misspelled or defined by a module not included in the server configuration Action 'configtest' failed. The Apache error log may have more information. Thanks!

    Read the article

  • Space in search base OU causes error in Active Directory

    - by Jared Farrish
    Recently, while putting together some code to page Active Directory results beyond sizeLimit=1000, we ran into a strange behavior/bug of AD. Specifically, if we had an OU with a space in the search base, it caused an error: String base = "OU=Area X,OU=myserver,DC=my,DC=ad,DC=myserver,DC=com"; env.put(Context.PROVIDER_URL, "ldap://my.ad.myserver.com:389/" + base); This is the error we received: javax.naming.NamingException: [LDAP: error code 1 - 000020D6: SvcErr: DSID-031007DB, problem 5012 (DIR_ERROR), data 0 When we remove that OU, it works fine. What would cause this to occur? Do we need to encode the space somehow (+ and %20 only caused more issues)? Or is this generally illegal/unnecessary?

    Read the article

  • Im unable to open pdf files using kword ?! what to do?

    - by Curious Apprentice
    I have read in many places that kword can open pdf and save it in doc format ! but on ubuntu 11.10 I can not open pdf using kword as it does not showing pdf as supported files ?!!! I want a pdf to doc converter on Ubuntu 11.10. I can open pdf using libreofffice but it displaying garbled characters with over 1000+ pages for a pdf with only 15 pages. What can I do so that Kword became able to open pdf and save it to doc or odt format ??

    Read the article

  • SQLite, python, unicode, and non-utf data

    - by Nathan Spears
    I started by trying to store strings in sqlite using python, and got the message: sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings. Ok, I switched to Unicode strings. Then I started getting the message: sqlite3.OperationalError: Could not decode to UTF-8 column 'tag_artist' with text 'Sigur Rós' when trying to retrieve data from the db. More research and I started encoding it in utf8, but then 'Sigur Rós' starts looking like 'Sigur Rós' note: My console was set to display in 'latin_1' as @John Machin pointed out. What gives? After reading this, describing exactly the same situation I'm in, it seems as if the advice is to ignore the other advice and use 8-bit bytestrings after all. I didn't know much about unicode and utf before I started this process. I've learned quite a bit in the last couple hours, but I'm still ignorant of whether there is a way to correctly convert 'ó' from latin-1 to utf-8 and not mangle it. If there isn't, why would sqlite 'highly recommend' I switch my application to unicode strings? I'm going to update this question with a summary and some example code of everything I've learned in the last 24 hours so that someone in my shoes can have an easy(er) guide. If the information I post is wrong or misleading in any way please tell me and I'll update, or one of you senior guys can update. Summary of answers Let me first state the goal as I understand it. The goal in processing various encodings, if you are trying to convert between them, is to understand what your source encoding is, then convert it to unicode using that source encoding, then convert it to your desired encoding. Unicode is a base and encodings are mappings of subsets of that base. utf_8 has room for every character in unicode, but because they aren't in the same place as, for instance, latin_1, a string encoded in utf_8 and sent to a latin_1 console will not look the way you expect. In python the process of getting to unicode and into another encoding looks like: str.decode('source_encoding').encode('desired_encoding') or if the str is already in unicode str.encode('desired_encoding') For sqlite I didn't actually want to encode it again, I wanted to decode it and leave it in unicode format. Here are four things you might need to be aware of as you try to work with unicode and encodings in python. The encoding of the string you want to work with, and the encoding you want to get it to. The system encoding. The console encoding. The encoding of the source file Elaboration: (1) When you read a string from a source, it must have some encoding, like latin_1 or utf_8. In my case, I'm getting strings from filenames, so unfortunately, I could be getting any kind of encoding. Windows XP uses UCS-2 (a Unicode system) as its native string type, which seems like cheating to me. Fortunately for me, the characters in most filenames are not going to be made up of more than one source encoding type, and I think all of mine were either completely latin_1, completely utf_8, or just plain ascii (which is a subset of both of those). So I just read them and decoded them as if they were still in latin_1 or utf_8. It's possible, though, that you could have latin_1 and utf_8 and whatever other characters mixed together in a filename on Windows. Sometimes those characters can show up as boxes, other times they just look mangled, and other times they look correct (accented characters and whatnot). Moving on. (2) Python has a default system encoding that gets set when python starts and can't be changed during runtime. See here for details. Dirty summary ... well here's the file I added: \# sitecustomize.py \# this file can be anywhere in your Python path, \# but it usually goes in ${pythondir}/lib/site-packages/ import sys sys.setdefaultencoding('utf_8') This system encoding is the one that gets used when you use the unicode("str") function without any other encoding parameters. To say that another way, python tries to decode "str" to unicode based on the default system encoding. (3) If you're using IDLE or the command-line python, I think that your console will display according to the default system encoding. I am using pydev with eclipse for some reason, so I had to go into my project settings, edit the launch configuration properties of my test script, go to the Common tab, and change the console from latin-1 to utf-8 so that I could visually confirm what I was doing was working. (4) If you want to have some test strings, eg test_str = "ó" in your source code, then you will have to tell python what kind of encoding you are using in that file. (FYI: when I mistyped an encoding I had to ctrl-Z because my file became unreadable.) This is easily accomplished by putting a line like so at the top of your source code file: # -*- coding: utf_8 -*- If you don't have this information, python attempts to parse your code as ascii by default, and so: SyntaxError: Non-ASCII character '\xf3' in file _redacted_ on line 81, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details Once your program is working correctly, or, if you aren't using python's console or any other console to look at output, then you will probably really only care about #1 on the list. System default and console encoding are not that important unless you need to look at output and/or you are using the builtin unicode() function (without any encoding parameters) instead of the string.decode() function. I wrote a demo function I will paste into the bottom of this gigantic mess that I hope correctly demonstrates the items in my list. Here is some of the output when I run the character 'ó' through the demo function, showing how various methods react to the character as input. My system encoding and console output are both set to utf_8 for this run: '?' = original char <type 'str'> repr(char)='\xf3' '?' = unicode(char) ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data 'ó' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Now I will change the system and console encoding to latin_1, and I get this output for the same input: 'ó' = original char <type 'str'> repr(char)='\xf3' 'ó' = unicode(char) <type 'unicode'> repr(unicode(char))=u'\xf3' 'ó' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Notice that the 'original' character displays correctly and the builtin unicode() function works now. Now I change my console output back to utf_8. '?' = original char <type 'str'> repr(char)='\xf3' '?' = unicode(char) <type 'unicode'> repr(unicode(char))=u'\xf3' '?' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Here everything still works the same as last time but the console can't display the output correctly. Etc. The function below also displays more information that this and hopefully would help someone figure out where the gap in their understanding is. I know all this information is in other places and more thoroughly dealt with there, but I hope that this would be a good kickoff point for someone trying to get coding with python and/or sqlite. Ideas are great but sometimes source code can save you a day or two of trying to figure out what functions do what. Disclaimers: I'm no encoding expert, I put this together to help my own understanding. I kept building on it when I should have probably started passing functions as arguments to avoid so much redundant code, so if I can I'll make it more concise. Also, utf_8 and latin_1 are by no means the only encoding schemes, they are just the two I was playing around with because I think they handle everything I need. Add your own encoding schemes to the demo function and test your own input. One more thing: there are apparently crazy application developers making life difficult in Windows. #!/usr/bin/env python # -*- coding: utf_8 -*- import os import sys def encodingDemo(str): validStrings = () try: print "str =",str,"{0} repr(str) = {1}".format(type(str), repr(str)) validStrings += ((str,""),) except UnicodeEncodeError as ude: print "Couldn't print the str itself because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print ude try: x = unicode(str) print "unicode(str) = ",x validStrings+= ((x, " decoded into unicode by the default system encoding"),) except UnicodeDecodeError as ude: print "ERROR. unicode(str) couldn't decode the string because the system encoding is set to an encoding that doesn't understand some character in the string." print "\tThe system encoding is set to {0}. See error:\n\t".format(sys.getdefaultencoding()), print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the unicode(str) because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print uee try: x = str.decode('latin_1') print "str.decode('latin_1') =",x validStrings+= ((x, " decoded with latin_1 into unicode"),) try: print "str.decode('latin_1').encode('utf_8') =",str.decode('latin_1').encode('utf_8') validStrings+= ((x, " decoded with latin_1 into unicode and encoded into utf_8"),) except UnicodeDecodeError as ude: print "The string was decoded into unicode using the latin_1 encoding, but couldn't be encoded into utf_8. See error:\n\t", print ude except UnicodeDecodeError as ude: print "Something didn't work, probably because the string wasn't latin_1 encoded. See error:\n\t", print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the str.decode('latin_1') because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print uee try: x = str.decode('utf_8') print "str.decode('utf_8') =",x validStrings+= ((x, " decoded with utf_8 into unicode"),) try: print "str.decode('utf_8').encode('latin_1') =",str.decode('utf_8').encode('latin_1') except UnicodeDecodeError as ude: print "str.decode('utf_8').encode('latin_1') didn't work. The string was decoded into unicode using the utf_8 encoding, but couldn't be encoded into latin_1. See error:\n\t", validStrings+= ((x, " decoded with utf_8 into unicode and encoded into latin_1"),) print ude except UnicodeDecodeError as ude: print "str.decode('utf_8') didn't work, probably because the string wasn't utf_8 encoded. See error:\n\t", print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the str.decode('utf_8') because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t",uee print print "Printing information about each character in the original string." for char in str: try: print "\t'" + char + "' = original char {0} repr(char)={1}".format(type(char), repr(char)) except UnicodeDecodeError as ude: print "\t'?' = original char {0} repr(char)={1} ERROR PRINTING: {2}".format(type(char), repr(char), ude) except UnicodeEncodeError as uee: print "\t'?' = original char {0} repr(char)={1} ERROR PRINTING: {2}".format(type(char), repr(char), uee) print uee try: x = unicode(char) print "\t'" + x + "' = unicode(char) {1} repr(unicode(char))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = unicode(char) ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = unicode(char) {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) try: x = char.decode('latin_1') print "\t'" + x + "' = char.decode('latin_1') {1} repr(char.decode('latin_1'))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = char.decode('latin_1') ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = char.decode('latin_1') {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) try: x = char.decode('utf_8') print "\t'" + x + "' = char.decode('utf_8') {1} repr(char.decode('utf_8'))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = char.decode('utf_8') ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = char.decode('utf_8') {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) print x = 'ó' encodingDemo(x) Much thanks for the answers below and especially to @John Machin for answering so thoroughly.

    Read the article

  • What does this regex mean and why

    - by Kalec
    $ sed "s/\(^[a-z,0-9]*\)\(.*\)\( [a-z,0-9]*$\)/\1\2 \1/g" desired_file_name I apreciate it even if you only explain part of it or at lest structure it with words as in s\alphanumerical_at_start\something\alphanumerical_at_end\something_else\global Could someone explain what that means, why and are all regEx so ... awful ? I know that it replaces the first lowcase alphanumerical word with the last one. But could you explain bit by bit what's going on here ? what's with all the /\ and \(.*\)\ and everything else ? I'm just lost. EDIT: Here is what I do get: (^[a-z0-9]*) starting with a trough z and 0 trough 9; and [a-z,0-9]*$ is the same but the last word (however [0-9,a-z] = just first 2 characters / first character, or the entire word ?). Also: what does the * or the \(.*\)\ even mean ?

    Read the article

  • SQLite, python, unicode, and non-utf data

    - by Nathan Spears
    I started by trying to store strings in sqlite using python, and got the message: sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings. Ok, I switched to Unicode strings. Then I started getting the message: sqlite3.OperationalError: Could not decode to UTF-8 column 'tag_artist' with text 'Sigur Rós' when trying to retrieve data from the db. More research and I started encoding it in utf8, but then 'Sigur Rós' starts looking like 'Sigur Rós' note: My console was set to display in 'latin_1' as @John Machin pointed out. What gives? After reading this, describing exactly the same situation I'm in, it seems as if the advice is to ignore the other advice and use 8-bit bytestrings after all. I didn't know much about unicode and utf before I started this process. I've learned quite a bit in the last couple hours, but I'm still ignorant of whether there is a way to correctly convert 'ó' from latin-1 to utf-8 and not mangle it. If there isn't, why would sqlite 'highly recommend' I switch my application to unicode strings? I'm going to update this question with a summary and some example code of everything I've learned in the last 24 hours so that someone in my shoes can have an easy(er) guide. If the information I post is wrong or misleading in any way please tell me and I'll update, or one of you senior guys can update. Summary of answers Let me first state the goal as I understand it. The goal in processing various encodings, if you are trying to convert between them, is to understand what your source encoding is, then convert it to unicode using that source encoding, then convert it to your desired encoding. Unicode is a base and encodings are mappings of subsets of that base. utf_8 has room for every character in unicode, but because they aren't in the same place as, for instance, latin_1, a string encoded in utf_8 and sent to a latin_1 console will not look the way you expect. In python the process of getting to unicode and into another encoding looks like: str.decode('source_encoding').encode('desired_encoding') or if the str is already in unicode str.encode('desired_encoding') For sqlite I didn't actually want to encode it again, I wanted to decode it and leave it in unicode format. Here are four things you might need to be aware of as you try to work with unicode and encodings in python. The encoding of the string you want to work with, and the encoding you want to get it to. The system encoding. The console encoding. The encoding of the source file Elaboration: (1) When you read a string from a source, it must have some encoding, like latin_1 or utf_8. In my case, I'm getting strings from filenames, so unfortunately, I could be getting any kind of encoding. Windows XP uses UCS-2 (a Unicode system) as its native string type, which seems like cheating to me. Fortunately for me, the characters in most filenames are not going to be made up of more than one source encoding type, and I think all of mine were either completely latin_1, completely utf_8, or just plain ascii (which is a subset of both of those). So I just read them and decoded them as if they were still in latin_1 or utf_8. It's possible, though, that you could have latin_1 and utf_8 and whatever other characters mixed together in a filename on Windows. Sometimes those characters can show up as boxes, other times they just look mangled, and other times they look correct (accented characters and whatnot). Moving on. (2) Python has a default system encoding that gets set when python starts and can't be changed during runtime. See here for details. Dirty summary ... well here's the file I added: \# sitecustomize.py \# this file can be anywhere in your Python path, \# but it usually goes in ${pythondir}/lib/site-packages/ import sys sys.setdefaultencoding('utf_8') This system encoding is the one that gets used when you use the unicode("str") function without any other encoding parameters. To say that another way, python tries to decode "str" to unicode based on the default system encoding. (3) If you're using IDLE or the command-line python, I think that your console will display according to the default system encoding. I am using pydev with eclipse for some reason, so I had to go into my project settings, edit the launch configuration properties of my test script, go to the Common tab, and change the console from latin-1 to utf-8 so that I could visually confirm what I was doing was working. (4) If you want to have some test strings, eg test_str = "ó" in your source code, then you will have to tell python what kind of encoding you are using in that file. (FYI: when I mistyped an encoding I had to ctrl-Z because my file became unreadable.) This is easily accomplished by putting a line like so at the top of your source code file: # -*- coding: utf_8 -*- If you don't have this information, python attempts to parse your code as ascii by default, and so: SyntaxError: Non-ASCII character '\xf3' in file _redacted_ on line 81, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details Once your program is working correctly, or, if you aren't using python's console or any other console to look at output, then you will probably really only care about #1 on the list. System default and console encoding are not that important unless you need to look at output and/or you are using the builtin unicode() function (without any encoding parameters) instead of the string.decode() function. I wrote a demo function I will paste into the bottom of this gigantic mess that I hope correctly demonstrates the items in my list. Here is some of the output when I run the character 'ó' through the demo function, showing how various methods react to the character as input. My system encoding and console output are both set to utf_8 for this run: '?' = original char <type 'str'> repr(char)='\xf3' '?' = unicode(char) ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data 'ó' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Now I will change the system and console encoding to latin_1, and I get this output for the same input: 'ó' = original char <type 'str'> repr(char)='\xf3' 'ó' = unicode(char) <type 'unicode'> repr(unicode(char))=u'\xf3' 'ó' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Notice that the 'original' character displays correctly and the builtin unicode() function works now. Now I change my console output back to utf_8. '?' = original char <type 'str'> repr(char)='\xf3' '?' = unicode(char) <type 'unicode'> repr(unicode(char))=u'\xf3' '?' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Here everything still works the same as last time but the console can't display the output correctly. Etc. The function below also displays more information that this and hopefully would help someone figure out where the gap in their understanding is. I know all this information is in other places and more thoroughly dealt with there, but I hope that this would be a good kickoff point for someone trying to get coding with python and/or sqlite. Ideas are great but sometimes source code can save you a day or two of trying to figure out what functions do what. Disclaimers: I'm no encoding expert, I put this together to help my own understanding. I kept building on it when I should have probably started passing functions as arguments to avoid so much redundant code, so if I can I'll make it more concise. Also, utf_8 and latin_1 are by no means the only encoding schemes, they are just the two I was playing around with because I think they handle everything I need. Add your own encoding schemes to the demo function and test your own input. One more thing: there are apparently crazy application developers making life difficult in Windows. #!/usr/bin/env python # -*- coding: utf_8 -*- import os import sys def encodingDemo(str): validStrings = () try: print "str =",str,"{0} repr(str) = {1}".format(type(str), repr(str)) validStrings += ((str,""),) except UnicodeEncodeError as ude: print "Couldn't print the str itself because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print ude try: x = unicode(str) print "unicode(str) = ",x validStrings+= ((x, " decoded into unicode by the default system encoding"),) except UnicodeDecodeError as ude: print "ERROR. unicode(str) couldn't decode the string because the system encoding is set to an encoding that doesn't understand some character in the string." print "\tThe system encoding is set to {0}. See error:\n\t".format(sys.getdefaultencoding()), print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the unicode(str) because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print uee try: x = str.decode('latin_1') print "str.decode('latin_1') =",x validStrings+= ((x, " decoded with latin_1 into unicode"),) try: print "str.decode('latin_1').encode('utf_8') =",str.decode('latin_1').encode('utf_8') validStrings+= ((x, " decoded with latin_1 into unicode and encoded into utf_8"),) except UnicodeDecodeError as ude: print "The string was decoded into unicode using the latin_1 encoding, but couldn't be encoded into utf_8. See error:\n\t", print ude except UnicodeDecodeError as ude: print "Something didn't work, probably because the string wasn't latin_1 encoded. See error:\n\t", print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the str.decode('latin_1') because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print uee try: x = str.decode('utf_8') print "str.decode('utf_8') =",x validStrings+= ((x, " decoded with utf_8 into unicode"),) try: print "str.decode('utf_8').encode('latin_1') =",str.decode('utf_8').encode('latin_1') except UnicodeDecodeError as ude: print "str.decode('utf_8').encode('latin_1') didn't work. The string was decoded into unicode using the utf_8 encoding, but couldn't be encoded into latin_1. See error:\n\t", validStrings+= ((x, " decoded with utf_8 into unicode and encoded into latin_1"),) print ude except UnicodeDecodeError as ude: print "str.decode('utf_8') didn't work, probably because the string wasn't utf_8 encoded. See error:\n\t", print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the str.decode('utf_8') because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t",uee print print "Printing information about each character in the original string." for char in str: try: print "\t'" + char + "' = original char {0} repr(char)={1}".format(type(char), repr(char)) except UnicodeDecodeError as ude: print "\t'?' = original char {0} repr(char)={1} ERROR PRINTING: {2}".format(type(char), repr(char), ude) except UnicodeEncodeError as uee: print "\t'?' = original char {0} repr(char)={1} ERROR PRINTING: {2}".format(type(char), repr(char), uee) print uee try: x = unicode(char) print "\t'" + x + "' = unicode(char) {1} repr(unicode(char))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = unicode(char) ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = unicode(char) {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) try: x = char.decode('latin_1') print "\t'" + x + "' = char.decode('latin_1') {1} repr(char.decode('latin_1'))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = char.decode('latin_1') ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = char.decode('latin_1') {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) try: x = char.decode('utf_8') print "\t'" + x + "' = char.decode('utf_8') {1} repr(char.decode('utf_8'))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = char.decode('utf_8') ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = char.decode('utf_8') {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) print x = 'ó' encodingDemo(x) Much thanks for the answers below and especially to @John Machin for answering so thoroughly.

    Read the article

< Previous Page | 105 106 107 108 109 110 111 112 113 114 115 116  | Next Page >