Search Results

Search found 1474 results on 59 pages for 'unicode'.

Page 23/59 | < Previous Page | 19 20 21 22 23 24 25 26 27 28 29 30  | Next Page >

  • "É" not getting converted to two bytes correctly.

    - by ChrisF
    Further to this question I've got a supplementary problem. I've found a track with an "É" in the title. My code: var playList = new StreamWriter(playlist, false, Encoding.UTF8); - private static void WriteUTF8(StreamWriter playList, string output) { byte[] byteArray = Encoding.UTF8.GetBytes(output); foreach (byte b in byteArray) { playList.Write(Convert.ToChar(b)); } } converts this to the following bytes: 195 137 which is being output as à followed by a square (which is an character that can't be printed in the current font). I've exported the same file to a playlist in Media Monkey at it writes the "É" as "É" - which I'm assuming is correct (as KennyTM pointed out). My question is, how do I get the "‰" symbol output? Do I need to select a different font and if so which one? UPDATE People seem to be missing the point. I can get the "É" written to the file using playList.WriteLine("É"); that's not the problem. The problem is that Media Monkey requires the file to be in the following format: #EXTINFUTF8:140,Yann Tiersen - Comptine D'Un Autre Été: L'Après Midi #EXTINF:140,Yann Tiersen - Comptine D'Un Autre Été: L'Après Midi #UTF8:04-Comptine D'Un Autre Été- L'Après Midi.mp3 04-Comptine D'Un Autre Été- L'Après Midi.mp3 Where all the "high-ascii" (for want of a better term) are written out as a pair of characters.

    Read the article

  • java: can I convert strings to byte arrays, without a BOM?

    - by Cheeso
    Suppose I have this code: String encoding = "UTF-16"; String text = "[Hello StackOverflow]"; byte[] message= text.getBytes(encoding); If I display the byte array in message, the result is: 0000 FE FF 00 5B 00 48 00 65 00 6C 00 6C 00 6F 00 20 ...[.H.e.l.l.o. 0010 00 53 00 74 00 61 00 63 00 6B 00 4F 00 76 00 65 .S.t.a.c.k.O.v.e 0020 00 72 00 66 00 6C 00 6F 00 77 00 5D .r.f.l.o.w.] As you can see, there's a BOM in the beginning. How can I: generate a UTF-16 byte array that lacks a BOM, from a string? convert from a byte array that contains UTF-16 chars but lacks a BOM, back to a string?

    Read the article

  • Code to strip diacritical marks using ICU

    - by Paul J. Lucas
    Can somebody please provide some sample code to strip diacritical marks (i.e., replace characters having accents, umlauts, etc., with their unaccented, unumlauted, etc., character equivalents, e.g., every accented é would become a plain ASCII e) from a UnicodeString using the ICU library in C++? E.g.: UnicodeString strip_diacritics( UnicodeString const &s ) { UnicodeString result; // ... return result; } Assume that s has already been normalized. Thanks.

    Read the article

  • python unichr problem

    - by jacob
    I've got some problem with unichr() on my server. Please see below: On my server (Ubuntu 9.04): >>> print unichr(255) Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in position 0: ordinal not in range(128) On my desktop (Ubuntu 9.10): >>> print unichr(255) ÿ I'm fairly new to python so I don't know how to solve this. Anyone care to help? Thanks.

    Read the article

  • to escape or not to escape: well formed XHTML with diacritics

    - by andresmh
    Say that you have a XHTML document in English but it has accented characters (e.g. meta name="author" content="José"). Let's say you have no control over the HTTP headers. Should the characters be replaced for their corresponding named entities (e.g. &aacute;, etc)? Should the doc type and the xml:lang attribute be set to English? I know I can check the W3C recommendation but I am asking more from a practical point of view.

    Read the article

  • Ruby character encoding problems in netbeans and command wíndow

    - by salgo60
    I use netbeans as development IDE and runs the application from cmd but have problems to display ISO 8859-1 characters like åäö correct in both cmd window and when I run the application from netbeans Question: What is best practice to set it up Right now I do @output.puts indent + "V" + 132.chr + "lkommen till Ruby Camping!" to get ä My environment chcp 65001 Active code page: 65001 ruby main.rb Source encoding: <Encoding:US-ASCII> Default external: #<Encoding:UTF-8> Default internal: nil Locale charmap: "CP65001" where I have in the code def self.printEncoding puts "Source encoding: #{__ENCODING__.inspect}" if defined? __ENCODING__ if defined? Environment::Encoding puts "Default external: #{Encoding.default_external.inspect}" puts "Default internal: #{Encoding.default_internal.inspect}" puts "Locale charmap: #{ Encoding.locale_charmap.inspect}" end puts "LANG environment variable: #{ENV['LANG'].inspect}" unless ENV['LANG'].nil? end ruby -v ruby 1.9.1p378 (2010-01-10 revision 26273) [i386-mingw32]

    Read the article

  • ajax(search suggest) funny character problem

    - by Jason
    ajax(search suggest), if input funny character(like Ô) to search it, "?" is displayed in firefox or empty box is displayed in IE. i am using xmlhttp.open("post", "*****.asp", true); xmlhttp.setRequestHeader('Content-type','application/x-www-form-urlencoded; charset=UTF-8'); and there is <%@CODEPAGE=65001%> in *****.asp file how can i fix it?

    Read the article

  • Why isn't wchar_t widely used in code for Linux / related platforms?

    - by Ninefingers
    This intrigues me, so I'm going to ask - for what reason is wchar_t not used so widely on Linux/Linux-like systems as it is on Windows? Specifically, the Windows API uses wchar_t internally whereas I believe Linux does not and this is reflected in a number of open source packages using char types. My understanding is that given a character c which requires multiple bytes to represent it, then in a char[] form c is split over several parts of char* whereas it forms a single unit in wchar_t[]. Is it not easier, then, to use wchar_t always? Have I missed a technical reason that negates this difference? Or is it just an adoption problem?

    Read the article

  • How to generate pdf files _with_ utf-8 multibyte characters using Zend Framework

    - by Sejanus
    Hello, I've got a "little" problem with Zend Framework Zend_Pdf class. Multibyte characters are stripped from generated pdf files. E.g. when I write aabccdee it becomes abcd with lithuanian letters stripped. I'm not sure if it's particularly Zend_Pdf problem or php in general. Source text is encoded in utf-8, as well as the php source file which does the job. Thank you in advance for your help ;) P.S. I run Zend Framework v. 1.6 and I use FONT_TIMES_BOLD font. FONT_TIMES_ROMAN does work

    Read the article

  • Space-saving character encoding for japanese?

    - by Constantin
    In my opinion a common problem: character encoding in combination with a bitmap-font. Most multi-language encodings have an huge space between different character types and even a lot of unused code points there. So if I want to use them I waste a lot of memory (not only for saving multi-byte text - i mean specially for spaces in my bitmap-font) - and VRAM is mostly really valuable... So the only reasonable thing seems to be: Using an custom mapping on my texture for i.e. UTF-8 characters (so that no space is waste). BUT: This effort seems to be same with use an own proprietary character encoding (so also own order of characters in my texture). In my specially case I got texture space for 4096 different characters and need characters to display latin languages as well as japanese (its a mess with utf-8 that only support generall cjk codepages). Had somebody ever a similiar problem (I really wonder, if not)? If theres already any approach? Edit: The same Problem is described here http://www.tonypottier.info/Unicode_And_Japanese_Kanji/ but it doesnt provide an real solution how to save these bitmapfont mappings to utf-8 space efficent. So any further help is welcome!

    Read the article

  • Ruby character encoding problems in netabenas and command wíndow

    - by salgo60
    I use netbeans as development IDE and runs the application from cmd but have problems to display ISO 8859-1 characters like åäö correct in both cmd window and when I run the application from netbeans Question: What is best practice to set it up Right now I do @output.puts indent + "V" + 132.chr + "lkommen till Ruby Camping!" to get ä My environment chcp 65001 Active code page: 65001 ruby main.rb Source encoding: <Encoding:US-ASCII> Default external: #<Encoding:UTF-8> Default internal: nil Locale charmap: "CP65001" where I have in the code def self.printEncoding puts "Source encoding: #{__ENCODING__.inspect}" if defined? __ENCODING__ if defined? Environment::Encoding puts "Default external: #{Encoding.default_external.inspect}" puts "Default internal: #{Encoding.default_internal.inspect}" puts "Locale charmap: #{ Encoding.locale_charmap.inspect}" end puts "LANG environment variable: #{ENV['LANG'].inspect}" unless ENV['LANG'].nil? end ruby -v ruby 1.9.1p378 (2010-01-10 revision 26273) [i386-mingw32]

    Read the article

  • Convert UTF-16 to UTF-8 under Windows and Linux, in C

    - by DooriBar
    I was wondering if there is a recommended 'cross' Windows and Linux method for the purpose of converting strings from UTF-16LE to UTF-8? or one should use different methods for each environment? I've managed to google few references to 'iconv' , but for somreason I can't find samples of basic conversions, such as - converting a wchar_t UTF-16 to UTF-8. Anybody can recommend a method that would be 'cross', and if you know of references or a guide with samples, would very appreciate it. Thanks, Doori Bar

    Read the article

  • tchar safe functions -- count parameter for UTF-8 constants

    - by Dustin Getz
    I'm porting a library from char to TCHAR. the count parameter of this fragment, according to MSDN, is the number of multibyte characters, not the number of bytes. so, did I get this right? _tcsncmp(access, TEXT("ftp"), 3); //or do i want _tcsnccmp? "Supported on Windows platforms only, _mbsncmp and _mbsnbcmp are multibyte versions of strncmp. _mbsncmp will compare at most count multibyte characters and _mbsnbcmp will compare at most count bytes. They both use the current multibyte code page. _tcsnccmp and _tcsncmp are the corresponding Generic functions for _mbsncmp and _mbsnbcmp, respectively. _tccmp is equivalent to _tcsnccmp."

    Read the article

  • Most Lite-Weight XML Parser with XPath and Wide-char Support

    - by Mystagogue
    I want a lite-weight C++ XML parser/DOM that: Can take UTF-8 as input, and parse into UTF-16. Maybe it does this directly (ideal!), or perhaps it provides a hook for the conversion (such as taking a custom stream object that does the conversion before parsing). Offers some XPath support. I've been looking at RapidXML, the Kranf xmlParser, and pugiXML. The first two of those might permit requirement #1 by way of a hook. The third, pugiXML, supports the #2 requirement. But none of those three fulfill both requirements. What is the smallest (free) library that can handle both requirements?

    Read the article

  • Weird SQL Server 2005 Collation difference between varchar() and nvarchar()

    - by richardtallent
    Can someone please explain this: SELECT CASE WHEN CAST('iX' AS nvarchar(20)) > CAST('-X' AS nvarchar(20)) THEN 1 ELSE 0 END, CASE WHEN CAST('iX' AS varchar(20)) > CAST('-X' AS varchar(20)) THEN 1 ELSE 0 END Results: 0 1 SELECT CASE WHEN CAST('i' AS nvarchar(20)) > CAST('-' AS nvarchar(20)) THEN 1 ELSE 0 END, CASE WHEN CAST('i' AS varchar(20)) > CAST('-' AS varchar(20)) THEN 1 ELSE 0 END Results: 1 1 On the first query, the nvarchar() result is not what I'm expecting, and yet removing the X make the nvarchar() sort happen as expected. (My original queries used the '' and N'' literal syntax to distinguish varchar() and nvarchar() rather than CAST() and got the same result.) Collation setting for the database is SQL_Latin1_General_CP1_CI_AS.

    Read the article

  • Culture Sensitive GetHashCode

    - by user114928
    Hi, I'm writing a c# application that will process some text and provide basic query functions. In order to ensure the best possible support for other languages, I am allowing the users of the application to specify the System.Globalization.CultureInfo (via the "en-GB" style code) and also the full range of collation options using the System.Globalization.CompareOptions flags enum. For regular string comparison I'm then using a combination of: a) String.Compare overload that accepts the culture and options b) For some bulk processes I'm caching the byte data (KeyData) from CompareInfo.GetSortKey (overload that accepts the options) and using a byte-by-byte comparison of the KeyData. This seemed fine (although please comment if you think these two methods shouldn't be mixed), but then I had reason to use the HashSet< class which only has an overload for IEqualityComparer<. MS documentation seems to suggest that I should use StringComparer (which implements both IEqualityComparer< and IComparer<), but this only seems to support the "IgnoreCase" option from CompareOptions and not "IgnoreKanaType", "IgnoreSymbols", "IgnoreWidth" etc. I'm assuming that a StringComparer that ignores these other options could produce different hashcodes for two strings that might be considered the same using my other comparison options. I'd therefore get incorrect results from my application. Only thought at the moment is to create my own IEqualityComparer< that generates a hashcode from the SortKey.KeyData and compares eqality be using the String.Compare overload. Any suggestions?

    Read the article

  • Emailing HTML from within an iPhone app is stopping at special characters

    - by user141146
    Hi, I have an iPhone app that will let users email some pre-determined text as HTML. I'm having a problem in that if the text contains special characters within the text (e.g., ampersand &, , <), the NSString variable that I use for sending the body of the email gets truncated at the special character. I'm not sure how to fix this (I tried using the method stringByAddingPercentEscapesUsingEncoding…but this hasn't fixed the problems). Thoughts on what I'm doing wrong / how to fix it? Here is sample code showing what I'm trying to do Thanks!!! - (void)send_an_email:(id)sender { NSString *subject_string = [NSString stringWithFormat:@"Summary of %@", commercial_name]; NSString *body_string = [NSString stringWithFormat:@"%@<br /><br />", [self.dl email_message]]; // email_message returns the body of text that should be shipped as html. If email_message contains special characters, the text truncates at the special character NSString *full_string = [NSString stringWithFormat:@"mailto:?to=&subject=%@&body=%@", [subject_string stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding], [body_string stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding]]; [[UIApplication sharedApplication] openURL:[[NSURL alloc] initWithString:full_string]]; }

    Read the article

  • Using Markov models to convert all caps to mixed case and related problems

    - by hippietrail
    I've been thinking about using Markov techniques to restore missing information to natural language text. Restore mixed case to text in all caps Restore accents / diacritics to languages which should have them but have been converted to plain ASCII Convert rough phonetic transcriptions back into native alphabets That seems to be in order of least difficult to most difficult. Basically the problem is resolving ambiguities based on context. I can use Wiktionary as a dictionary and Wikipedia as a corpus using n-grams and Markov chains to resolve the ambiguities. Am I on the right track? Are there already some services, libraries, or tools for this sort of thing? Examples GEORGE LOST HIS SIM CARD IN THE BUSH - George lost his SIM card in the bush tantot il rit a gorge deployee - tantôt il rit à gorge déployée

    Read the article

  • reading non-english html pages with c#

    - by Gal Miller
    I am trying to find a string in Hebrew in a website. The reading code is attached. Afterward I try to read the file using streamReader but I can't match strings in other languages. what am I suppose to do? // used on each read operation byte[] buf = new byte[8192]; // prepare the web page we will be asking for HttpWebRequest request = (HttpWebRequest) WebRequest.Create("http://www.webPage.co.il"); // execute the request HttpWebResponse response = (HttpWebResponse) request.GetResponse(); // we will read data via the response stream Stream resStream = response.GetResponseStream(); string tempString = null; int count = 0; FileStream fileDump = new FileStream(@"c:\dump.txt", FileMode.Create); do { count = resStream.Read(buf, 0, buf.Length); fileDump.Write(buf, 0, buf.Length); } while (count > 0); // any more data to read? fileDump.Close();

    Read the article

  • Matching 'weird' characters in PHP regex

    - by Bill X
    I have some strings that need a-strippin': ÃœT: 9.996636,76.294363 Tons of long strings of location codes. A literal regex in PHP won't match them, IE $pattern = /ÃœT:/; echo preg_replace($pattern, "", $row['location']); Won't match/strip anything. (To know it's working, /T:/ does strip the last bit of that string). What's the encoding error doing on here? Alternately, I would accept a concise way to take out just the numbers.

    Read the article

< Previous Page | 19 20 21 22 23 24 25 26 27 28 29 30  | Next Page >