illegal characters - Page 18

Incremental search for un/accented characters

- by user38983

Does emacs have an incremental search mode, where searching for a character will search for itself and for any other versions of the character with accent marks, similar to how Google Chrome (at least v27) will do when searching in a page? Alternatively, is there an additional library or piece of elisp code that can put incremental search in such a mode? For example, incremental search for: 'manana', would find 'manana' or 'mañana' 'motley crue', would also find 'Mötley Crüe' (with case-sensitivity off). Even a solution that only covers a subset of these characters would be helpful.

Read the article

php returns junk characters at end of everything

- by blindJesse

php appears to be adding junk characters to the end of everything it returns on a friend's site. I'm not an admin on the server but I'd like to give an informed complaint to get this fixed. The site is http://daytoncodebreakers.org. You can see some junk at the end of every page on the site (what appear to be question marks with something else in the middle). I originally thought this was a wordpress issue, but check out http://daytoncodebreakers.org/whereisini.php (which is just a call to phpinfo), and http://daytoncodebreakers.org/hello.php (which is just 'Hello World'). I'm not sure if this is the most appropriate site, but I think this is a server config issue, so I'm posting it here (rather than stackoverflow or superuser). Feel free to move it if want.

Read the article

Excel, Lookup special characters and spaces.

- by Sisyphus

I have an excel, spreadsheet that has multiple sheets. The first sheet is an index of files, I am using the following forumla to look up a value in column A, references against the index sheet, if it matches then it copies the value from column B from the index sheet. The forumla is: =IF($A3="", "", (LOOKUP($A3, INDEX!$A$3:$A$26, INEDEX!B$3:B$26))) It works for data that has no spaces and special characters, anybody have any ideas why it doesn't work and how I can make it work? Thanks in advance.

Read the article

How to support 3-byte UTF-8 Characters in ANT

- by efelton

I am trying to support UTF-8 characters in my ANT script. As long as the character string are made up of 2-byte UTF-8 characters, such as: Lògìñ Ùsèr ÌÐ Then things work fine. When I use Unicode Han Character: ? Which, according to this site: http://www.fileformat.info/info/unicode/char/6211/index.htm Has a UTF-8 encoding of 0xE6 0x88 0x91 I can see in UltraEdit, my input properties file has the values "E6 88 91" all in a row, so I'm fairly confident that my input is correct. And When I open the same file in Notepad++ I can see all the characters correctly. Here is my Build Script: <?xml version="1.0" encoding="UTF-8" ?> <project name="utf8test" default="all" basedir="."> <target name="all"> <loadproperties encoding="UTF-8" srcfile="./apps.properties.all.txt" /> <echo>No encoding ${common.app.name}</echo> <echo encoding="UTF-8">UTF-8 ${common.app.name}</echo> <echo encoding="UnicodeLittle">UnicodeLittle ${common.app.name}</echo> <echo encoding="UnicodeLittleUnmarked">UnicodeLittleUnmarked ${common.app.name}</echo> <echo>${common.app.ServerName}</echo> <echo>${bb.vendor}</echo> <echo>No encoding ${common.app.UserIdText}</echo> <echo encoding="UTF-8">UTF-8 ${common.app.UserIdText}</echo> <echo encoding="UnicodeLittle">UnicodeLittle ${common.app.UserIdText}</echo> <echo encoding="UnicodeLittleUnmarked">UnicodeLittleUnmarked ${common.app.UserIdText}</echo> <echoproperties /> </target> </project> And here is my properties file: common.app=VrvPsLTst common.app.name=?? common.app.description=Pseudo Loc Test App for Build Script testing common.app.ServerName=http://Vèrìvò.com bb.vendor=Vèrìvò common.app.PasswordText=Pàsswòrð bb.override.list=MP_COPYRIGHTTEXT, "Çòpÿrìght 2012 Vèrívó Bùîlð TéàM" common.app.LoginButtonText=Lògìñ common.app.UserIdText=Ùsèr ÌÐ bb.SMSSuccess=Mèssàgéß Sùççêssfúllÿ Sëñt common.app.LoginScreenMessage=WèlçòMé Mêssàgë common.app.LoginProgressMessage=Àùthèñtìçàtíòñ îñ prógréss... ios.RegistrationText=Règìstràtíòñ Téxt ios.RegistrationURL=http://www.josscrowcroft.com/2011/code/utf-8-multibyte-characters-in-url-parameters-%E2%9C%93/ Here is what the output looks like: Buildfile: C:\Temp\utf8\build.xml all: [echo] No encoding ?? [echo] UTF-8 ?? [echo] ÿþU n i c o d e L i t t l e ? ? [echo] U n i c o d e L i t t l e U n m a r k e d ? ? [echo] http://Vèrìvò.com [echo] Vèrìvò [echo] No encoding Ùsèr ÌÐ [echo] UTF-8 Ã™sÃ¨r ÃŒÃ? [echo] ÿþU n i c o d e L i t t l e Ù s è r Ì Ð [echo] U n i c o d e L i t t l e U n m a r k e d Ù s è r Ì Ð [echoproperties] #Ant properties [echoproperties] #Mon Jun 18 15:25:13 EDT 2012 [echoproperties] ant.core.lib=C\:\\ant\\lib\\ant.jar [echoproperties] ant.file=C\:\\Temp\\utf8\\build.xml [echoproperties] ant.file.type=file [echoproperties] ant.file.type.utf8test=file [echoproperties] ant.file.utf8test=C\:\\Temp\\utf8\\build.xml [echoproperties] ant.home=c\:\\ant\\bin\\.. [echoproperties] ant.java.version=1.6 [echoproperties] ant.library.dir=C\:\\ant\\lib [echoproperties] ant.project.default-target=all [echoproperties] ant.project.invoked-targets=all [echoproperties] ant.project.name=utf8test [echoproperties] ant.version=Apache Ant version 1.8.1 compiled on April 30 2010 [echoproperties] awt.toolkit=sun.awt.windows.WToolkit [echoproperties] basedir=C\:\\Temp\\utf8 [echoproperties] bb.SMSSuccess=M\u00E8ss\u00E0g\u00E9\u00DF S\u00F9\u00E7\u00E7\u00EAssf\u00FAll\u00FF S\u00EB\u00F1t [echoproperties] bb.override.list=MP_COPYRIGHTTEXT, "\u00C7\u00F2p\u00FFr\u00ECght 2012 V\u00E8r\u00EDv\u00F3 B\u00F9\u00EEl\u00F0 T\u00E9\u00E0?" [echoproperties] bb.vendor=V\u00E8r\u00ECv\u00F2 [echoproperties] common.app=VrvPsLTst [echoproperties] common.app.LoginButtonText=L\u00F2g\u00EC\u00F1 [echoproperties] common.app.LoginProgressMessage=\u00C0\u00F9th\u00E8\u00F1t\u00EC\u00E7\u00E0t\u00ED\u00F2\u00F1 \u00EE\u00F1 pr\u00F3gr\u00E9ss... [echoproperties] common.app.LoginScreenMessage=W\u00E8l\u00E7\u00F2?\u00E9 M\u00EAss\u00E0g\u00EB [echoproperties] common.app.PasswordText=P\u00E0ssw\u00F2r\u00F0 [echoproperties] common.app.ServerName=http\://V\u00E8r\u00ECv\u00F2.com [echoproperties] common.app.UserIdText=\u00D9s\u00E8r \u00CC\u00D0 [echoproperties] common.app.description=Pseudo Loc Test App for Build Script testing [echoproperties] common.app.name=?? [echoproperties] file.encoding=Cp1252 [echoproperties] file.encoding.pkg=sun.io [echoproperties] file.separator=\\ [echoproperties] ios.RegistrationText=R\u00E8g\u00ECstr\u00E0t\u00ED\u00F2\u00F1 T\u00E9xt [echoproperties] ios.RegistrationURL=http\://www.josscrowcroft.com/2011/code/utf-8-multibyte-characters-in-url-parameters-%E2%9C%93/ [echoproperties] java.awt.graphicsenv=sun.awt.Win32GraphicsEnvironment [echoproperties] java.awt.printerjob=sun.awt.windows.WPrinterJob [echoproperties] java.class.path=c\:\\ant\\bin\\..\\lib\\ant-launcher.jar;C\:\\Temp\\utf8\\.\\;C\:\\Program Files (x86)\\Java\\jre7\\lib\\ext\\QTJava.zip;C\:\\ant\\lib\\ant-antlr.jar;C\:\\ant\\lib\\ant-apache-bcel.jar;C\:\\ant\\lib\\ant-apache-bsf.jar;C\:\\ant\\lib\\ant-apache-log4j.jar;C\:\\ant\\lib\\ant-apache-oro.jar;C\:\\ant\\lib\\ant-apache-regexp.jar;C\:\\ant\\lib\\ant-apache-resolver.jar;C\:\\ant\\lib\\ant-apache-xalan2.jar;C\:\\ant\\lib\\ant-commons-logging.jar;C\:\\ant\\lib\\ant-commons-net.jar;C\:\\ant\\lib\\ant-contrib-1.0b3.jar;C\:\\ant\\lib\\ant-jai.jar;C\:\\ant\\lib\\ant-javamail.jar;C\:\\ant\\lib\\ant-jdepend.jar;C\:\\ant\\lib\\ant-jmf.jar;C\:\\ant\\lib\\ant-jsch.jar;C\:\\ant\\lib\\ant-junit.jar;C\:\\ant\\lib\\ant-launcher.jar;C\:\\ant\\lib\\ant-netrexx.jar;C\:\\ant\\lib\\ant-nodeps.jar;C\:\\ant\\lib\\ant-starteam.jar;C\:\\ant\\lib\\ant-stylebook.jar;C\:\\ant\\lib\\ant-swing.jar;C\:\\ant\\lib\\ant-testutil.jar;C\:\\ant\\lib\\ant-trax.jar;C\:\\ant\\lib\\ant-weblogic.jar;C\:\\ant\\lib\\ant.jar;C\:\\ant\\lib\\bb-ant-tools.jar;C\:\\ant\\lib\\xercesImpl.jar;C\:\\ant\\lib\\xml-apis.jar;C\:\\Program Files\\Java\\jre7\\lib\\tools.jar [echoproperties] java.class.version=51.0 [echoproperties] java.endorsed.dirs=C\:\\Program Files\\Java\\jre7\\lib\\endorsed [echoproperties] java.ext.dirs=C\:\\Program Files\\Java\\jre7\\lib\\ext;C\:\\Windows\\Sun\\Java\\lib\\ext [echoproperties] java.home=C\:\\Program Files\\Java\\jre7 [echoproperties] java.io.tmpdir=C\:\\Users\\efelton\\AppData\\Local\\Temp\\ [echoproperties] java.library.path=C\:\\Windows\\SYSTEM32;C\:\\Windows\\Sun\\Java\\bin;C\:\\Windows\\system32;C\:\\Windows;C\:\\Windows\\SYSTEM32;C\:\\Windows;C\:\\Windows\\SYSTEM32\\WBEM;C\:\\Windows\\SYSTEM32\\WINDOWSPOWERSHELL\\V1.0\\;C\:\\PROGRAM FILES\\INTEL\\WIFI\\BIN\\;C\:\\PROGRAM FILES\\COMMON FILES\\INTEL\\WIRELESSCOMMON\\;C\:\\PROGRAM FILES (X86)\\MICROSOFT SQL SERVER\\100\\TOOLS\\BINN\\;C\:\\PROGRAM FILES\\MICROSOFT SQL SERVER\\100\\TOOLS\\BINN\\;C\:\\PROGRAM FILES\\MICROSOFT SQL SERVER\\100\\DTS\\BINN\\;C\:\\PROGRAM FILES (X86)\\MICROSOFT SQL SERVER\\100\\TOOLS\\BINN\\VSSHELL\\COMMON7\\IDE\\;C\:\\PROGRAM FILES (X86)\\MICROSOFT SQL SERVER\\100\\DTS\\BINN\\;C\:\\Program Files\\ThinkPad\\Bluetooth Software\\;C\:\\Program Files\\ThinkPad\\Bluetooth Software\\syswow64;C\:\\Program Files (x86)\\QuickTime\\QTSystem\\;C\:\\Program Files (x86)\\AccuRev\\bin;C\:\\Program Files\\Java\\jdk1.7.0_04\\bin;C\:\\Program Files (x86)\\IDM Computer Solutions\\UltraEdit\\;. [echoproperties] java.runtime.name=Java(TM) SE Runtime Environment [echoproperties] java.runtime.version=1.7.0_04-b22 [echoproperties] java.specification.name=Java Platform API Specification [echoproperties] java.specification.vendor=Oracle Corporation [echoproperties] java.specification.version=1.7 [echoproperties] java.vendor=Oracle Corporation [echoproperties] java.vendor.url=http\://java.oracle.com/ [echoproperties] java.vendor.url.bug=http\://bugreport.sun.com/bugreport/ [echoproperties] java.version=1.7.0_04 [echoproperties] java.vm.info=mixed mode [echoproperties] java.vm.name=Java HotSpot(TM) 64-Bit Server VM [echoproperties] java.vm.specification.name=Java Virtual Machine Specification [echoproperties] java.vm.specification.vendor=Oracle Corporation [echoproperties] java.vm.specification.version=1.7 [echoproperties] java.vm.vendor=Oracle Corporation [echoproperties] java.vm.version=23.0-b21 [echoproperties] line.separator=\r\n [echoproperties] os.arch=amd64 [echoproperties] os.name=Windows 7 [echoproperties] os.version=6.1 [echoproperties] path.separator=; [echoproperties] sun.arch.data.model=64 [echoproperties] sun.boot.class.path=C\:\\Program Files\\Java\\jre7\\lib\\resources.jar;C\:\\Program Files\\Java\\jre7\\lib\\rt.jar;C\:\\Program Files\\Java\\jre7\\lib\\sunrsasign.jar;C\:\\Program Files\\Java\\jre7\\lib\\jsse.jar;C\:\\Program Files\\Java\\jre7\\lib\\jce.jar;C\:\\Program Files\\Java\\jre7\\lib\\charsets.jar;C\:\\Program Files\\Java\\jre7\\lib\\jfr.jar;C\:\\Program Files\\Java\\jre7\\classes [echoproperties] sun.boot.library.path=C\:\\Program Files\\Java\\jre7\\bin [echoproperties] sun.cpu.endian=little [echoproperties] sun.cpu.isalist=amd64 [echoproperties] sun.desktop=windows [echoproperties] sun.io.unicode.encoding=UnicodeLittle [echoproperties] sun.java.command=org.apache.tools.ant.launch.Launcher -cp .;C\:\\Program Files (x86)\\Java\\jre7\\lib\\ext\\QTJava.zip [echoproperties] sun.java.launcher=SUN_STANDARD [echoproperties] sun.jnu.encoding=Cp1252 [echoproperties] sun.management.compiler=HotSpot 64-Bit Tiered Compilers [echoproperties] sun.os.patch.level=Service Pack 1 [echoproperties] user.country=US [echoproperties] user.dir=C\:\\Temp\\utf8 [echoproperties] user.home=C\:\\Users\\efelton [echoproperties] user.language=en [echoproperties] user.name=efelton [echoproperties] user.script= [echoproperties] user.timezone= [echoproperties] user.variant= BUILD SUCCESSFUL Total time: 1 second Thank you for your help EDIT\UPDATE 6/19/2012 I am developing in a Windows environment. I have installed a TTF from: http://freedesktop.org/wiki/Software/CJKUnifonts/Download I have updated UltraEdit to use the TTF and I can see the Chinese characters. <?xml version="1.0" encoding="UTF-8" ?> <project name="utf8test" default="all" basedir="."> <target name="all"> <echo>??</echo> <echo encoding="ISO-8859-1">ISO-8859-1 ??</echo> <echo encoding="UTF-8">UTF-8 ??</echo> <echo file="echo_output.txt" append="true" >?? ${line.separator}</echo> <echo file="echo_output.txt" append="true" encoding="ISO-8859-1">ISO-8859-1 ?? ${line.separator}</echo> <echo file="echo_output.txt" append="true" encoding="UTF-8">UTF-8 ?? ${line.separator}</echo> <echo file="echo_output.txt" append="true" encoding="UnicodeLittle">UnicodeLittle ?? ${line.separator}</echo> <echo file="echo_output.txt" append="true" encoding="UnicodeLittleUnmarked">UnicodeLittleUnmarked ?? ${line.separator}</echo> </target> </project> The output captured by running inside UltraEdit is: Buildfile: E:\temp\utf8\build.xml all: [echo] ?? [echo] ISO-8859-1 ?? [echo] UTF-8 ?? BUILD SUCCESSFUL Total time: 1 second And the echo_output.txt file shows up like this: ?? ISO-8859-1 ?? UTF-8 ?? ÿþU n i c o d e L i t t l e ? ? U n i c o d e L i t t l e U n m a r k e d ? ? So there appears to be somehting fundamentally wrong with how my ANT environment is set up since I cannot simply echo the character to the screen or to a file.

Read the article

Java how can I add an accented "e" to a string?

- by behrk2

Hello, With the help of tucuxi from the existing post Java remove HTML from String without regular expressions I have built a method that will parse out any basic HTML tags from a string. Sometimes, however, the original string contains html hexadecimal characters like é (which is an accented e). I have started to add functionality which will translate these escaped characters into real characters. You're probably asking: Why not use regular expressions? Or a third party library? Unfortunately I cannot, as I am developing on a BlackBerry platform which does not support regular expressions and I have never been able to successfully add a third party library to my project. So, I have gotten to the point where any é is replaced with "e". My question now is, how do I add an actual 'accented e' to a string? Here is my code: public static String removeHTML(String synopsis) { char[] cs = synopsis.toCharArray(); String sb = new String(); boolean tag = false; for (int i = 0; i < cs.length; i++) { switch (cs[i]) { case '<': if (!tag) { tag = true; break; } case '>': if (tag) { tag = false; break; } case '&': char[] copyTo = new char[7]; System.arraycopy(cs, i, copyTo, 0, 7); String result = new String(copyTo); if (result.equals("&#x00E9")) { sb += "e"; } i += 7; break; default: if (!tag) sb += cs[i]; } } return sb.toString(); } Thanks!

Read the article

Problem with XML encoding of database contents with Latin characters

- by user89691

I have an ASP Access database that contains strings in various European languages. The database was populated prior by agents in the respective countries. It contains entries with accented etc characters as you would expect. If I open the database with MS Access these characters show up fine. For example the the German equivalent of "Open" shows as "Öffnen" (hopefully you can see an "O" with 2 dots above it!). I have ASP code that reads the database and returns records in XML. The text is passed to XMLEncode to construct the XML, but that only seems to deal with the 5 specials like "<", "&", etc. If I dump the XML the accented characters are unchanged. <English>Open</English> <German>Öffnen</German> If I look at the raw packets with Wireshark I see that the "Ö" byte is hex D6, which appears to be it's decimal Unicode and ISO 8859-1 value. The problem starts when I try to parse the XML in client-side JS. I get: "An invalid character was found in text content" from IE. FF and Chrome happily accept the XML without hiccup but the browser shows the "Ö" character as a diamond with a question mark inside. http://www.validome.org/xml/validate/ reports "encoding error." http://www.w3schools.com/dom/dom_validate.asp thinks it is fine. The XML is UTF-8 encoded. What do I need to do to have IE accept my XML without complaint? What do I need to do to have browsers display the stuff correctly?

Read the article

Creating files with french characters and encoding.

- by Kevin

HI, I am creating a file like so. FileStream temp = File.Create( this.FileName ); Then putting data in the file like so. this.Writer = new StreamWriter( this.Stream ); this.Writer.WriteLine( strMessage ); That code is encapsulated in a class hierarchy but that is the meat and potatoes of it. My problem is this. MSDN says that the default encoding for creating a file this way is UTF8. And when I write a french character such as é Textpad interprets the file as a UTF 8 file, but notepad++ says it's "ANSI as UTF8" or maybe it's an ansi file but is reading it as UTF8. When I create a file the same way without the french character both textpad and notepad++ read the file as an ansi file even though according to msdn it should be a utf 8 file still. Which program should be trusted. Notepad++ or textpad - Notepad++ seems to be more consistant, but is still the oppossite to what MSDN says it should be. My problem is that we create files that get sent off to another company and depending on whether there are french characters the encoding seems to keep changing. Or is there a better way to determine the encoding of a file. I've read about byte order marks and preambles but as far as I understand neither are guaranteed to be there. We initially thought that all the files we were building were ansi. Also please note that both ansi and utf8 should handle the french characters appropriately as the characters are part of both character sets.

Read the article

Replacing XML reserved characters in SQL Server 2005

- by Barn

I'm working on a system that takes relational data from a sql server DB and uses SSIS to produce an XML extract using sql server 2005's 'FOR XML PATH' command and a schema. The problem lies with replacing the XML reserved characters. 'FOR XML PATH' is only replacing <, , and &, not ' and ", so I need a way of replacing these myself. I've tried pre-processing the fields in the database to replace XML reserved characters with their entitised equivalents (e.g. & becomes &), but once these fields are used to construct XML using FOR XML the leading & is replaced with &, so I end up with &amp; where I should have &. What I've tried so far is altering the element's contents after the XML has been constructed using XQuery inside SQL server like so: DECLARE @data VARCHAR(MAX) SET @data = CONVERT(VARCHAR(MAX), [my xml column].query(' data(/root/node_i_want)') SELECT @data = [function to replace quotes etc](@data) SET [my xml column].modify('replace value of (/root/node_i_want)[1] with sql:variable("@data")') but I get the same problem. Essentially, is there something wrong I'm doing with the above, or a way to tell FOR XML to entitise other characters, or something like that? Basically anything short of having to write a program to change the XML after it has been assembled in large batches and saved to files!

Read the article

oracle pl/sql bug: can't put_line more than 2000 characters

- by FrustratedWithFormsDesigner

Has anyone else noticed this phenomenon where dbms_output.put_line is unable to print more than 2000 characters at a time? Script is: set serveroutput on size 100000; declare big_str varchar2(2009); begin for i in 1..2009 loop big_str := big_str||'x'; end loop; dbms_output.put_line(length(big_str)); dbms_output.put_line(big_str); end; / I copied and pasted the output into an editor (Notepad++) which told me there were only 2000 characters, not 2009 which is what I think should have been pasted. This also happens with a few of my test scripts - only 2000 characters get printed. I have a workaround to print like this: dbms_output.put_line(length(big_str)); dbms_output.put_line(substr(big_str,1,1999)); dbms_output.put_line(substr(big_str,2000)); This adds new lines to the output, makes it hard to read when the text you're working with is preformatted. Has anyone else noticed this? Is it really a bug or some sort of obscure feature? Is there a better workaround? Is there any other information on this out there? Oracle version is: 10.2.0.3.0, using PL/SQL Developer (from Allround Automation).

Read the article

Lucene and Special Characters

- by Brandon

I am using Lucene.Net 2.0 to index some fields from a database table. One of the fields is a 'Name' field which allows special characters. When I perform a search, it does not find my document that contains a term with special characters. I index my field as such: Directory DALDirectory = FSDirectory.GetDirectory(@"C:\Indexes\Name", false); Analyzer analyzer = new StandardAnalyzer(); IndexWriter indexWriter = new IndexWriter(DALDirectory, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED); Document doc = new Document(); doc.Add(new Field("Name", "Test (Test)", Field.Store.YES, Field.Index.TOKENIZED)); indexWriter.AddDocument(doc); indexWriter.Optimize(); indexWriter.Close(); And I search doing the following: value = value.Trim().ToLower(); value = QueryParser.Escape(value); Query searchQuery = new TermQuery(new Term(field, value)); Searcher searcher = new IndexSearcher(DALDirectory); TopDocCollector collector = new TopDocCollector(searcher.MaxDoc()); searcher.Search(searchQuery, collector); ScoreDoc[] hits = collector.TopDocs().scoreDocs; If I perform a search for field as 'Name' and value as 'Test', it finds the document. If I perform the same search as 'Name' and value as 'Test (Test)', then it does not find the document. Even more strange, if I remove the QueryParser.Escape line do a search for a GUID (which, of course, contains hyphens) it finds documents where the GUID value matches, but performing the same search with the value as 'Test (Test)' still yields no results. I am unsure what I am doing wrong. I am using the QueryParser.Escape method to escape the special characters and am storing the field and searching by the Lucene.Net's examples. Any thoughts?

Read the article

Replacing accented/umlauted characters with their unadorned counterparts in C# [closed]

- by Andrew Rollings

Duplicate of 249087 I have a bunch of user generated addresses that may contain characters with diacritic marks. What is the most effective (i.e. generic) way (apart from a straightforward replace) to automatically convert any such characters to their closest English equivalent? E.g. any of àâãäå would become a æ would become the two separate letters ae ç would become c any of èéêë would become e etc. for all possible letter variations (preferably without having to find and encode lookups for each diacritic form of the letter). (Note: I have to pass these addresses on to third party software that is incapable of printing anything other than English characters. I'd rather the software was capable of handling them, but I have no control over that.) EDIT: Never mind... Found the answer [here][2]. It showed up in the "Related" section to the right of the question after I posted, but not in my prior search or as a pre-post suggestion. Hmm. I added the 'diacritics' tag to the other question in any case. EDIT 2: Jeez! Who voted this -1 after I closed it?

Read the article

How to parse XML with special characters?

- by Snooze

Whenever I try to parse XML with special characters such as o or ???? I get an error. The xml documents claims to use UTF-8 encoding but that does not seem to be the case. Here is what the troublesome text looks like when I view the XML in Firefox: Bleach: The Diamond Dust Rebellion - MÅ? Hitotsu no HyÅ?rinmaru; Bleach - The DiamondDust Rebellion - Mou Hitotsu no Hyourinmaru On the actual website, Å? is actually the character o. <br /> One day, Doraemon and his friends meet Professor Mangetsu (æº?æ??å??ç??, Professor Mangetsu?), who studies magic and magical beings such as goblins, and his daughter Miyoko (ç¾?å¤?å?, Miyoko?), and are warned of the dangerous approximation of the "star of the Underworld" to the Earth's orbit.<br /> <br /> And once again, on the actual website, those characters appear as ???? and ???. The actual XML file is formatted properly other than those special characters, which certainly do not appear to be using the UTF-8 encoding. Is there a way to get NSXML to parse these XML files?

Read the article

Word wrap in multiline textbox after 35 characters

- by Kanavi

<asp:TextBox CssClass="txt" ID="TextBox1" runat="server" onkeyup="CountChars(this);" Rows="20" Columns="35" TextMode="MultiLine" Wrap="true"> </asp:TextBox> I need to implement word-wrapping in a multi-line textbox. I cannot allow users to write more then 35 chars a line. I am using the following code, which breaks at precisely the specified character on every line, cutting words in half. Can we fix this so that if there's not enough space left for a word on the current line, we move the whole word to the next line? function CountChars(ID) { var IntermediateText = ''; var FinalText = ''; var SubText = ''; var text = document.getElementById(ID.id).value; var lines = text.split("\n"); for (var i = 0; i < lines.length; i++) { IntermediateText = lines[i]; if (IntermediateText.length <= 50) { if (lines.length - 1 == i) FinalText += IntermediateText; else FinalText += IntermediateText + "\n"; } else { while (IntermediateText.length > 50) { SubText = IntermediateText.substring(0, 50); FinalText += SubText + "\n"; IntermediateText = IntermediateText.replace(SubText, ''); } if (IntermediateText != '') { if (lines.length - 1 == i) FinalText += IntermediateText; else FinalText += IntermediateText + "\n"; } } } document.getElementById(ID.id).value = FinalText; $('#' + ID.id).scrollTop($('#' + ID.id)[0].scrollHeight); } Edit - 1 I have to show total max 35 characters in line without specific word break and need to keep margin of two characters from the right. Again, the restriction should be for 35 characters but need space for total 37 (Just for the Visibility issue.)

Read the article

Corrupt UTF-8 Characters with PHP 5.2.10 and MySQL 5.0.81

- by jkndrkn

We have an application hosted on both a local development server and a live site. We are experiencing UTF-8 corruption issues and are looking to figure out how to resolve them. The system is run using symfony 1.0 with Propel. On our development server, we are running PHP 5.2.0 and MySQL 5.0.32. We do not experience corrupted UTF-8 characters there. On our live site, PHP 5.2.10 and MySQL 5.0.81 is running. On that server, certain characters such as ô´ and S are corrupted once they are stored in the database. The corrupted characters are showing up as either question marks or approximations of the original character with adjacent question marks. Examples of corruption: Uncorrupted: ô´ Corrupted: ô? Uncorrupted: S Corrupted: ? We are currently using the following techniques on both development and live servers: Executing the following queries prior to execution of any other queries: SET NAMES 'utf8' COLLATE 'utf8_unicode_ci' SET CHARSET 'utf8' Setting the <meta> Content-Type value to: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> Adding the following to our .htaccess file: AddDefaultCharset utf-8 Using mb_* (multibyte) PHP functions where necessary. Being sure to set database columns to use utf8_unicode_ci collation. These techniques are sufficient for our development site, but do not work on the live site. On the live site I've also tried adding mysql_set_encoding('ut8', $mysql_connection) but this does not help either. I have found some evidence that newer versions of PHP and MySQL are mishandling UTF-8 character encodings.

Read the article

How to convert string with double high/wide characters to normal string [VC++6]

- by Shaitan00

My application typically recieves a string in the following format: " Item $5.69 " Some contants I always expect: - the LENGHT always 20 characters - the start index of the text always [5] - and most importantly the index of the DECIMAL for the price always [14] In order to identify this string correctly I validate all the expected contants listed above .... Some of my clients have now started sending the string with Doube-High / Double-Wide values (pair of characters which represent a single readable character) similar to the following: " Item $x80x90.x81x91x82x92 " For testing I simply scan the string character-by-character, compare char[i] and char[i+1] and replace these pairs with their corresponding single character when a match is found (works fine) as follows: [Code] for (int i=0; i < sData.length(); i++) { char ch = sData[i] & 0xFF; char ch2 = sData[i+1] & 0xFF; if (ch == '\x80' && ch2 == '\x90') zData.replace("\x80\x90", "0"); else if (ch == '\x81' && ch2 == '\x91') zData.replace("\x81\x91", "1"); else if (ch == '\x82' && ch2 == '\x92') zData.replace("\x82\x92", "2"); ... ... ... } [/Code] But the result is something like this: " Item $5.69 " Notice how this no longer matches my expectation: the lenght is now 17 (instead of 20) due to the 3 conversions and the decimal is now at index 13 (instead of 14) due to the conversion of the "5" before the decimal point. Ideally I would like to convert the string to a normal readable format keeping the constants (length, index of text, index of decimal) at the same place (so the rest of my application is re-usable) ... or any other suggestion (I'm pretty much stuck with this)... Is there a STANDARD way of dealing with these type of characters? Any help would be greatly appreciated, I've been stuck on this for a while now ... Thanks,

Read the article

Number of characters recommended for a statement

- by liaK

Hi, I have been using Qt 4.5 and so do C++. I have been told that it's a standard practice to maintain the length of each statement in the application to 80 characters. Even in Qt creator we can make a right border visible so that we can know whether we are crossing the 80 characters limit. But my question is, Is it really a standard being followed? Because in my application, I use indenting and all, so it's quite common that I cross the boundary. Other cases include, there might be a error statement which will be a bit explanatory one and which is in an inner block of code, so it too will cross the boundary. Usually my variable names look bit lengthier so as to make the names meaningful. When I call the functions of the variable names, again I will cross. Function names will not be in fewer characters either. I agree a horizontal scroll bar shows up and it's quite annoying to move back and forth. So, for function calls including multiple arguments, when the boundary is reached I will make the forth coming arguments in the new line. But besides that, for a single statement (for e.g a very long error message which is in double quotes " " or like longfun1()->longfun2()->...) if I use an \ and split into multiple lines, the readability becomes very poor. So is it a good practice to have those statement length restrictions? If this restriction in statement has to be followed? I don't think it depends on a specific language anyway. I added C++ and Qt tags since if it might. Any pointers regarding this are welcome.

Read the article

nginx inserting extra characters in Multi-status reply body

- by user125011

Here's the setup. I've got one server running apache/php hosting ownCloud. Among other things, I'm using to do CardDAV contact syncing. In order to make things work with my domain I have an nginx server running on the frontend as a reverse-proxy to the ownCloud server. My nginx config is as follows: server { listen 80; server_name cloud.mydomain.com; location / { proxy_set_header X-Forwarded-Host cloud.mydomain.com; proxy_set_header X-Forwarded-Proto http; proxy_set_header X-Forwarded-For $remote_addr; client_max_body_size 0; proxy_redirect off; proxy_pass http://server; } } The problem is that when my phone does a PROPFIND on the server, nginx adds extra characters to the content body that throw the phone off. Specifically, it prepends d611\r\n at the front of the body and appends 0\r\n\r\n to the end of the content. (I got this from wireshark.) It also re-chunks the result. How do I get nginx to send the original content as-is?

Read the article

ash scripting: space-containing variable refuses to be grepped

- by Luci Sandor

I am trying to run the script listed at http://talk.maemo.org/showthread.php?t=70866&page=2 on its intended hardware, a Nokia Linux phone running BusyBox ash. The script receives the name of WiFi network as a parameter, and tries to connect the phone to it. I suspect the script works, but my SSID, BU (802.1x), has space and parentheses in it. So when I type at the command prompt autoconnect.sh BU\ $802.1x$ I get various errors. First, LIST=`iwconfig wlan0 | awk -F":" '/ESSID/{print $2}'` if [ $LIST = "\"$1\"" ]; then ...fails, even I am connected to the network. The error is not avoided by using single or double quotes instead of escaping characters at the command prompt. Second, if [ -z `iwlist wlan0 scan | grep -m 1 -o \"$1\"` ]; then echo SSID \"$1\" not found; shows that grep does not find the string, although the same grep, typed directly into the command prompt, does find 'BU (802.1x)'. How do I quote $1 in the two circumstances above so that it will work with my network SSID, containing spaces and parentheses? Thank you.

Read the article

pdftotext not outputting hebrew characters

- by Ofri Raviv

I'm using Xpdf's pdftotext to get the text out of some hebrew pdf files on Ubuntu. On my local machine this worked fine. I then tried to do it on another machine and the hebrew characters don't show up in the text file. I verified that I have the language package (see below why I think so). Where else can I look for the problem? >> tail -2 /etc/xpdf/xpdfrc include /etc/xpdf/includes >> cat /etc/xpdf/includes # This file was automatically generated by /usr/sbin/update-xpdfrc. # Instead, add or remove files in /etc/xpdf/ then run # /usr/sbin/update-xpdfrc to regenerate this file. include /etc/xpdf/xpdfrc-latin2 include /etc/xpdf/xpdfrc-thai include /etc/xpdf/xpdfrc-greek include /etc/xpdf/xpdfrc-turkish include /etc/xpdf/xpdfrc-arabic include /etc/xpdf/xpdfrc-hebrew include /etc/xpdf/xpdfrc-cyrillic >> cat /etc/xpdf/xpdfrc-hebrew #----- begin Hebrew support package (2003-feb-16) unicodeMap ISO-8859-8 /usr/share/xpdf/hebrew/ISO-8859-8.unicodeMap unicodeMap Windows-1255 /usr/share/xpdf/hebrew/Windows-1255.unicodeMap #----- end Hebrew support package >> ls /usr/share/xpdf/hebrew/ ISO-8859-8.unicodeMap Windows-1255.unicodeMap

Read the article

Trouble with backslash characters and rsyslog writing to postgres

- by Flimzy

I have rsyslog 4.6.4 configured to write mail logs to a PostgreSQL database. It all works fine, until the log message contains a backslash, as in this example: Jun 12 11:37:46 dc5 postfix/smtp[26475]: Vk0nYDKdH3sI: to=<[email protected], relay=----.---[---.---.---.---]:25, delay=1.5, delays=0.77/0.07/0.3/0.35, dsn=4.3.0, status=deferred (host ----.---[199.85.216.241] said: 451 4.3.0 Error writing to file d:\pmta\spool\B\00000414, status = ERROR_DISK_FULL in "DATA" (in reply to end of DATA command)) The above is the log entry, as written to /var/log/mail.log. It is correct. The trouble is that the backslash characters in the file name are interpreted as escapes when sent to the following SQL recipe: $template dcdb, "SELECT rsyslog_insert(('%timereported:::date-rfc3339%'::TIMESTAMPTZ)::TIMESTAMP,'%msg:::escape-cc%'::TEXT,'%syslogtag%'::VARCHAR)",STDSQL :syslogtag, startswith, "postfix" :ompgsql:/var/run/postgresql,dc,root,;dcdb As a result, the rsyslog_insert() stored procedure gets the following value for as msg: Vk0nYDKdH3sI: to=<[email protected], relay=----.---[---.---.---.---]:25, delay=1.5, delays=0.77/0.07/0.3/0.35, dsn=4.3.0, status=deferred (host ----.---[199.85.216.241] said: 451 4.3.0 Error writing to file d:pmtaspoolB The \p, \s, \B and \0 in the file name are interpreted by PostgreSQL as literal p, s, and B followed by a NULL character, thus early-terminating the string. This behavior can be easiily confirmed with: dc=# SELECT 'd:\pmta\spool\B\00000414'; ?column? -------------- d:pmtaspoolB (1 row) dc=# Is there a way to correct this problem? Is there a way I'm not finding in the rsyslog docs to turn \ into \\?

Read the article

How to correctly set GNU Screen to display currently running program in hardstatus

- by johnny_bgoode

In bash, to display the name of the current program in the GNU Screen hardstatus line takes only two configuration lines. First, tell screen what the end of your prompt normally looks like, and supply a default title for a window when you are sitting at in the shell: shelltitle "$ |bash" Next, place this escape sequence in the PS1 variable, before the characters that normally terminate the prompt '$ ' in this case: \033k\033\\ This technique works, to a point. The hardstatus window title is updated to the name of the currently running program, and then switches back to the default title shortly after execution is finished. One major problem, however, is that this escape string is not escaped itself, causing line-wrapping problems with commands longer than the initial line. This was annoying, so I set out looking for a solution. Turns out, simply escaping the previous escape sequence corrects line wrapping: \[\033k\]\[\033\\\] Great! My hardstatus window title still updates to the name of the currently running program, and now my longer commands wrap to the second line correctly. However, with this new escape sequence in my PS1, screen updates the window title to the actual command I am typing, not simply the name of the current program once it is executed. I am wondering, has anyone gotten this working correctly - i.e. line wrapping and proper updating of the hardstatus window title?

Read the article

Batch Script to Trim lines in text to first 30 or 50 characters only

- by SuperUserMan

I am now new to scripts but i find it really difficult understanding "for" command (especially with that tokens and delimiters etc) . Saying so, i think that for command can be used to do what i am doing. If its not and there is an easier way, ignore my ignorance :( Say i have multiple lines in a text file abc.txt with each line starting and ending with " (quotes) E.g. a file of 3 lines "hey what is going on @mike220. I am working on your car. Its engine is in very bad condition" "Because if you knew, you'd get shredded and do it with certainty" "@honey220 Do you know someone who has busted their ass on a diet only for results to come to a screeching halt after a few weeks" How can i trim each line, within the quotes, to a Fixed length say 30 or 50 or 100 characters (including spaces) I want to enter the number of character in batch and it can trim accordingly and produce a file def.txt with trimmed lines within quotes. Say i enter 50, results of above example should be "hey what is going on @mike220. I am working on you" "Because if you knew, you'd get shredded and do it" "@honey220 Do you know someone who has busted their" Thanks P.S. if you use For command, kindly please explain the command. EDIT: Though the answer provided worked, there is an issue with non english text. I am getting garbled text in Output file for non english text in input file . Any help @barlop here is the nonenglish text ( 1 line) "???? ?? ???? ?? ???? ???? ??? ?????? ???"

Read the article

PuTTY inserts random characters during a session

- by Zachary Polikarpus

I recently started renting space on a remote server so that I could work on a project. I found that a relatively painless way to access it on a windows machine is through PuTTY. However, there is one thing that has always irked me when using it: for seemingly no reason random characters are sometimes inserted at the cursor. Most of the time it is just a single tilde, but rarely it spits out what looks like some escape sequence ([[^8 or the like). It will only occur when I am focused on the window, whether I am typing or 20 feet away from the keyboard. If left for long enough, it will spit tildes at random intervals (average is about 1 minute). Finally, this behavior seems to be inconsistant when running programs such as nano or the mysql interface: in nano, instead of inserting tildes, it will set marks (ctrl-^); in mysql, lines will become un-editable. My question is this: Has anyone else experienced this sort of behavior in PuTTY? And if so, what can be done to prevent/correct this behavior?

Read the article

Permanent fix for unicode characters not displaying correctly (as boxes)

- by Chase

Please read this entire message before replying. First I know how to fix the issue on a temporary basis. I am looking for a permanent fix. I work with foreign language files a lot. Unfortunately sometimes all the unicode characters in windows explorer, notepad, and other places (as rendered by windows, probably GDI) do not display correctly. That is they display as square blocks, where as they had just been displaying correctly. There are countless methods to temporarily correct the issue. But again, I want a way to permanently resolve the issue. What I have tried: The silly "Hide fonts based on language settings". This setting only applies to what fonts you see in the fonts folder and font dropdowns. It doesn't disable foreign fonts (doesn't work, or if it does, it is temporary). Deleting the font cache file and rebooting (works.. usually, temporary solution). Changing my locale and then back (sometimes works, temporary solution). Rebooting my PC and getting lucky (50-50 chance, temporary solution). Changing my keyboard input/adding foreign keyboard (temporary solution that only seems to work once). Reinstalling windows (temporary solution, sometimes lasts a few months though, I have done this 7 times across 3 computers) What I have not tried: Buying Windows Ultimate and installing the interface packs. This is not a solution. I can't read Japanese/Chinese and I do not want my interface in those languages. What I will not do: Switch to a different brand operating system (unix, linux, mac os x) Switch to an older version of windows (Windows Vista, XP, 2000, etc). So can anyone recommend a permanent fix for the problem?

Read the article

Python - pyparsing unicode characters

- by mgj

Hi..:) I tried using w = Word(printables), but it isn't working. How should I give the spec for this. 'w' is meant to process Hindi characters (UTF-8) The code specifies the grammar and parses accordingly. 671.assess :: ????? ::2 x=number + "." + src + "::" + w + "::" + number + "." + number If there is only english characters it is working so the code is correct for the ascii format but the code is not working for the unicode format. I mean that the code works when we have something of the form 671.assess :: ahsaas ::2 i.e. it parses words in the english format, but I am not sure how to parse and then print characters in the unicode format. I need this for English Hindi word alignment for purpose. The python code looks like this: # -*- coding: utf-8 -*- from pyparsing import Literal, Word, Optional, nums, alphas, ZeroOrMore, printables , Group , alphas8bit , # grammar src = Word(printables) trans = Word(printables) number = Word(nums) x=number + "." + src + "::" + trans + "::" + number + "." + number #parsing for eng-dict efiledata = open('b1aop_or_not_word.txt').read() eresults = x.parseString(efiledata) edict1 = {} edict2 = {} counter=0 xx=list() for result in eresults: trans=""#translation string ew=""#english word xx=result[0] ew=xx[2] trans=xx[4] edict1 = { ew:trans } edict2.update(edict1) print len(edict2) #no of entries in the english dictionary print "edict2 has been created" print "english dictionary" , edict2 #parsing for hin-dict hfiledata = open('b1aop_or_not_word.txt').read() hresults = x.scanString(hfiledata) hdict1 = {} hdict2 = {} counter=0 for result in hresults: trans=""#translation string hw=""#hin word xx=result[0] hw=xx[2] trans=xx[4] #print trans hdict1 = { trans:hw } hdict2.update(hdict1) print len(hdict2) #no of entries in the hindi dictionary print"hdict2 has been created" print "hindi dictionary" , hdict2 ''' ####################################################################################################################### def translate(d, ow, hinlist): if ow in d.keys():#ow=old word d=dict print ow , "exists in the dictionary keys" transes = d[ow] transes = transes.split() print "possible transes for" , ow , " = ", transes for word in transes: if word in hinlist: print "trans for" , ow , " = ", word return word return None else: print ow , "absent" return None f = open('bidir','w') #lines = ["'\ #5# 10 # and better performance in business in turn benefits consumers . # 0 0 0 0 0 0 0 0 0 0 \ #5# 11 # vHyaapaar mEmn bEhtr kaam upbhOkHtaaomn kE lIe laabhpHrdd hOtaa hAI . # 0 0 0 0 0 0 0 0 0 0 0 \ #'"] data=open('bi_full_2','rb').read() lines = data.split('!@#$%') loc=0 for line in lines: eng, hin = [subline.split(' # ') for subline in line.strip('\n').split('\n')] for transdict, source, dest in [(edict2, eng, hin), (hdict2, hin, eng)]: sourcethings = source[2].split() for word in source[1].split(): tl = dest[1].split() otherword = translate(transdict, word, tl) loc = source[1].split().index(word) if otherword is not None: otherword = otherword.strip() print word, ' <-> ', otherword, 'meaning=good' if otherword in dest[1].split(): print word, ' <-> ', otherword, 'trans=good' sourcethings[loc] = str( dest[1].split().index(otherword) + 1) source[2] = ' '.join(sourcethings) eng = ' # '.join(eng) hin = ' # '.join(hin) f.write(eng+'\n'+hin+'\n\n\n') f.close() ''' if an example input sentence for the source file is: 1# 5 # modern markets : confident consumers # 0 0 0 0 0 1# 6 # AddhUnIk baajaar : AshHvsHt upbhOkHtaa . # 0 0 0 0 0 0 !@#$% the ouptut would look like this :- 1# 5 # modern markets : confident consumers # 1 2 3 4 5 1# 6 # AddhUnIk baajaar : AshHvsHt upbhOkHtaa . # 1 2 3 4 5 0 !@#$% Output Explanation:- This achieves bidirectional alignment. It means the first word of english 'modern' maps to the first word of hindi 'AddhUnIk' and vice versa. Here even characters are take as words as they also are an integral part of bidirectional mapping. Thus if you observe the hindi WORD '.' has a null alignment and it maps to nothing with respect to the English sentence as it doesn't have a full stop. The 3rd line int the output basically represents a delimiter when we are working for a number of sentences for which your trying to achieve bidirectional mapping. What modification should i make for it to work if the I have the hindi sentences in Unicode(UTF-8) format.

Search Results

Search found 5756 results on 231 pages for 'illegal characters'.

Page 18/231 | < Previous Page | 14 15 16 17 18 19 20 21 22 23 24 25 | Next Page >

- by user38983

- by blindJesse

- by Sisyphus

- by efelton

- by behrk2

- by user89691

- by Kevin

- by Barn

- by FrustratedWithFormsDesigner

- by Brandon

- by Andrew Rollings

- by Snooze

- by Kanavi

- by jkndrkn

- by Shaitan00

- by liaK

- by user125011

- by Luci Sandor

- by Ofri Raviv

- by Flimzy

- by johnny_bgoode

- by SuperUserMan

- by Zachary Polikarpus

- by Chase

- by mgj

< Previous Page | 14 15 16 17 18 19 20 21 22 23 24 25 | Next Page >