Search Results

Search found 28425 results on 1137 pages for 'source encoding'.

Page 121/1137 | < Previous Page | 117 118 119 120 121 122 123 124 125 126 127 128 | Next Page >

Display ñ on a C# .NET application

- by mmr

I have a localization issue. One of my industrious coworkers has replaced all the strings throughout our application with constants that are contained in a dictionary. That dictionary gets various strings placed in it once the user selects a language (English by default, but target languages are German, Spanish, French, Portuguese, Mandarin, and Thai). For our test of this functionality, we wanted to change a button to include text which has a ñ character, which appears both in Spanish and in the Arial Unicode MS font (which we're using throughout the application). Problem is, the ñ is appearing as a square block, as if the program did not know how to display it. When I debug into that particular string being read from disk, the debugger reports that character as a square block as well. So where is the failure? I think it could be in a few places: 1) Notepad may not be unicode aware, so the ñ displayed there is not the same as what vs2008 expects, and so the program interprets the character as a square (EDIT: notepad shows the same characters as vs; ie, they both show the ñ. In the same place.). 2) vs2008 can't handle ñ. I find that very, very hard to believe. 3) The text is read in properly, but the default font for vs2008 can't display it, which is why the debugger shows a square. 4) The text is not read in properly, and I should use something other than a regular StreamReader to get strings. 5) The text is read in properly, but the default String class in C# doesn't handle ñ well. I find that very, very hard to believe. 6) The version of Arial Unicode MS I have doesn't have ñ, despite it being listed as one of the 50k characters by http://www.fileinfo.info. Anything else I could have left out? Thanks for any help!

Read the article
Mysql german accents not-sensitive search in full-text searches

- by lukaszsadowski

Let`s have a example hotels table: CREATE TABLE `hotels` ( `HotelNo` varchar(4) character set latin1 NOT NULL default '0000', `Hotel` varchar(80) character set latin1 NOT NULL default '', `City` varchar(100) character set latin1 default NULL, `CityFR` varchar(100) character set latin1 default NULL, `Region` varchar(50) character set latin1 default NULL, `RegionFR` varchar(100) character set latin1 default NULL, `Country` varchar(50) character set latin1 default NULL, `CountryFR` varchar(50) character set latin1 default NULL, `HotelText` text character set latin1, `HotelTextFR` text character set latin1, `tagsforsearch` text character set latin1, `tagsforsearchFR` text character set latin1, PRIMARY KEY (`HotelNo`), FULLTEXT KEY `fulltextHotelSearch` (`HotelNo`,`Hotel`,`City`,`CityFR`,`Region`,`RegionFR`,`Country`,`CountryFR`,`HotelText`,`HotelTextFR`,`tagsforsearch`,`tagsforsearchFR`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_german1_ci; In this table for example we have only one hotel with Region name = "Graubünden" (please note umlaut ü character) And now I want to achieve same search match for phrases: 'graubunden' and 'graubünden' This is simple with use of MySql built in collations in regular searches as follows: SELECT * FROM `hotels` WHERE `Region` LIKE CONVERT(_utf8 '%graubunden%' USING latin1) COLLATE latin1_german1_ci This works fine for 'graubunden' and 'graubünden' and as a result I receive proper result, but problem is when we make MySQL full text search Whats wrong with this SQL statement?: SELECT * FROM hotels WHERE MATCH (`HotelNo`,`Hotel`,`Address`,`City`,`CityFR`,`Region`,`RegionFR`,`Country`,`CountryFR`, `HotelText`, `HotelTextFR`, `tagsforsearch`, `tagsforsearchFR`) AGAINST( CONVERT('+graubunden' USING latin1) COLLATE latin1_german1_ci IN BOOLEAN MODE) ORDER BY Country ASC, Region ASC, City ASC This doesn`t return any result. Any ideas where the dog is buried ?

Read the article
Filtering Wikipedia's XML dump: error on some accents

- by streetpc

I'm trying to index Wikpedia dumps. My SAX parser make Article objects for the XML with only the fields I care about, then send it to my ArticleSink, which produces Lucene Documents. I want to filter special/meta pages like those prefixed with Category: or Wikipedia:, so I made an array of those prefixes and test the title of each page against this array in my ArticleSink, using article.getTitle.startsWith(prefix). In English, everything works fine, I get a Lucene index with all the pages except for the matching prefixes. In French, the prefixes with no accent also work (i.e. filter the corresponding pages), some of the accented prefixes don't work at all (like Catégorie:), and some work most of the time but fail on some pages (like Wikipédia:) but I cannot see any difference between the corresponding lines (in less). I can't really inspect all the differences in the file because of its size (5 GB), but it looks like a correct UTF-8 XML. If I take a portion of the file using grep or head, the accents are correct (even on the incriminated pages, the <title>Catégorie:something</title> is correctly displayed by grep). On the other hand, when I rectreate a wiki XML by tail/head-cutting the original file, the same page (here Catégorie:Rock par ville) gets filtered in the small file, not in the original… Any idea ? Alternatives I tried: Getting the file (commented lines were tried wihtout success): FileInputStream fis = new FileInputStream(new File(xmlFileName)); //ReaderInputStream ris = ReaderInputStream.forceEncodingInputStream(fis, "UTF-8" ); //(custom function opening the stream, reading it as UFT-8 into a Reader and returning another byte stream) //InputSource is = new InputSource( fis ); is.setEncoding("UTF-8"); parser.parse(fis, handler); Filtered prefixes: ignoredPrefix = new String[] {"Catégorie:", "Modèle:", "Wikipédia:", "Cat\uFFFDgorie:", "Mod\uFFFDle:", "Wikip\uFFFDdia:", //invalid char "CatÃ©gorie:", "ModÃ¨le:", "WikipÃ©dia:", // UTF-8 as ISO-8859-1 "Image:", "Portail:", "Fichier:", "Aide:", "Projet:"}; // those last always work

Read the article
How to unescape special characters from BeautifulSoup output?

- by Suhail

Hi, I am facing issues with the special characters like ° and ® which represent the degree Fahrenheit sign and the registered sign, when i print the string the contains the special characters, it gives output like this: Preheat oven to 350° F Welcome to Lorem Ipsum Inc® Is there a way I can output the exact characters and not their codes? Please let me know.

Read the article
Convert French to ASCII (French speakers are wanted)

- by Andrey

i need to convert French text to most correct analog in ASCII. Let me explain. In German you should convert ä to ae, this is not simple removing of diacritics, it is finding most correct analogue. Please help me with French. I found that there is no programmatic way to do it, i create Dictionary<char, string>. To convert (+ capitals): é, à, è, ù, â, ê, î, ô, û, ë, ï, ü, ÿ, ç. and any other you suggest! Please write suggested substitution in ascii. Thanks, Andrey.

Read the article
Query MySQL with unicode char code.

- by Ben

Hi, I have been having trouble searching through a MySQL table, trying to find entries with the character (UTF-16 code 200E) in a particular column. This particular code doesn't have a glyph, so it doesn't seem to work when I try to paste it into my search term. Is there a way to specify characters as their respective code point instead for a query? Thanks, -Ben

Read the article
Explanation of JAXB error: Invalid byte 1 of 1-byte UTF-8 sequence

- by Marcus

We're parsing an XML document using JAXB and get this error: [org.xml.sax.SAXParseException: Invalid byte 1 of 1-byte UTF-8 sequence.] at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(AbstractUnmarshallerImpl.java:315) What exactly does this mean and how can we resolve this?? We are executing the code as: jaxbContext = JAXBContext.newInstance(Results.class); Unmarshaller unmarshaller = jaxbContext.createUnmarshaller(); unmarshaller.setSchema(getSchema()); results = (Results) unmarshaller.unmarshal(new FileInputStream(inputFile)); Update Issue appears to be due to this "funny" character in the XML file: ¿ Why would this cause such a problem??

Read the article
What is the role/responsibility of a 'shell'?

- by Rune

Hi, I have been looking at the source code of the IronPython project and the Orchard CMS project. IronPython operates with a namespace called Microsoft.Scripting.Hosting.Shell (part of the DLR). The Orchard Project also operates with the concept of a 'shell' indirectly in various interfaces (IShellContainerFactory, IShellSettings). None of the projects mentioned above have elaborate documentation, so picking up the meaning of a type (class etc.) from its name is pretty valuable if you are trying to figure out the overall application structure/architecture by reading the source code. Now I am wondering: what do the authors of this source code have in mind when they refer to a 'shell'? When I hear the word 'shell', I think of something like a command line interpreter. This makes sense for IronPython, since it has an interactive interpreter. But to me, it doesn't make much sense with respect to a Web CMS. What should I think of, when I encounter something called a 'shell'? What is, in general terms, the role and responsibility of a 'shell'? Can that question even be answered? Is the meaning of 'shell' subjective (making the term useless)? Thanks.

Read the article
What is a good resource for HTML character codes -> glyph and...

- by Ben

Hi, I've already found a good site to convert HTML character codes to their respective glyphs: http://www.public.asu.edu/~rjansen/glyph_encoding.html However, I need a bit more information. Does anyone know of a site like the one above that also provides information on what type of character code it is? Meaning, is it a special character? Is the glyph visible? Etc... So far I have found some tables with this information, but they aren't as complete as the resource above. I would really like to get my hands on a complete table. Thanks, -Ben

Read the article
What is the proper way to URL encode Unicode characters?

- by Josh Gibson

I know of the non-standard %uxxxx scheme but that doesn't seem like a wise choice since the scheme has been rejected by the W3C. Some interesting examples: The heart character. If I type this into my browser: http://www.google.com/search?q=? Then copy and paste it, I see this URL http://www.google.com/search?q=%E2%99%A5 which makes it seem like Firefox (or Safari) is doing this. urllib.quote_plus(x.encode("latin-1")) '%E2%99%A5' which makes sense, except for things that can't be encoded in Latin-1, like the triple dot character. … If I type the URL http://www.google.com/search?q=… into my browser then copy and paste, I get http://www.google.com/search?q=%E2%80%A6 back. Which seems to be the result of doing urllib.quote_plus(x.encode("utf-8")) which makes sense since … can't be encoded with Latin-1. But then its not clear to me how the browser knows whether to decode with UTF-8 or Latin-1. Since this seems to be ambiguous: In [67]: u"…".encode('utf-8').decode('latin-1') Out[67]: u'\xc3\xa2\xc2\x80\xc2\xa6' works, so I don't know how the browser figures out whether to decode that with UTF-8 or Latin-1. What's the right thing to be doing with the special characters I need to deal with?

Read the article
[Ruby] Why do I have to URI.encode even safe characters for Net::HTTP requests?

- by Matthias

I was trying to send a GET request to Twitter (user ID replaced for privacy reasons) using Net::HTTP: url = URI.parse("http://api.twitter.com/1/friends/ids.json?user_id=12345") resp = Net::HTTP.get_response(url) this throws an exception in Net::HTTP: NoMethodError: undefined method empty?' for #<URI::HTTP:0x59f5c04> from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:1470:ininitialize' just by coincidence, I stumbled upon a similar code snippet, which used URI.encode prior to URI.parse, so I copied that and tried again: url = URI.parse(URI.encode("http://api.twitter.com/1/friends/ids.json?user_id=12345")) resp = Net::HTTP.get_response(url) now it works fine, but why? There are no reserved characters that need escaping in the URL I mentioned, so why do I have to call URI.encode for get_response to succeed?

Read the article
Turning HTML character entities to 'regular' letters... why is it only partially working?

- by Jack W-H

I'm using all of the below to take a field called 'code' from my database, get rid of all the HTML entities, and print it 'as usual' to the site: <?php $code = preg_replace('~&#x([0-9a-f]+);~ei', 'chr(hexdec("\\1"))', $code); $code = preg_replace('~&#([0-9]+);~e', 'chr("\\1")', $code); $code = html_entity_decode($code); ?> However the exported code still looks like this: progid:DXImageTransform.Microsoft.AlphaImageLoader(src=â€™img/the_image.pngâ€™); See what's going on there? How many other things can I run on the string to turn them into darn regular characters?! Thanks! Jack

Read the article
Java UTF-8 to ASCII conversion with supplements

- by bozo

Hi, we are accepting all sorts of national characters in UTF-8 string on the input, and we need to convert them to ASCII string on the output for some legacy use. (we don't accept Chinese and Japanese chars, only European languages) We have a small utility to get rid of all the diacritics: public static final String toBaseCharacters(final String sText) { if (sText == null || sText.length() == 0) return sText; final char[] chars = sText.toCharArray(); final int iSize = chars.length; final StringBuilder sb = new StringBuilder(iSize); for (int i = 0; i < iSize; i++) { String sLetter = new String(new char[] { chars[i] }); sLetter = Normalizer.normalize(sLetter, Normalizer.Form.NFC); try { byte[] bLetter = sLetter.getBytes("UTF-8"); sb.append((char) bLetter[0]); } catch (UnsupportedEncodingException e) { } } return sb.toString(); } The question is how to replace all the german sharp s (ß, Ð, d) and other characters that get through the above normalization method, with their supplements (in case of ß, supplement would probably be "ss" and in case od Ð supplement would be either "D" or "Dj"). Is there some simple way to do it, without million of .replaceAll() calls? So for example: Ðonardan = Djonardan, Blaß = Blass and so on. We can replace all "problematic" chars with empty space, but would like to avoid this to make the output as similar to the input as possible. Thank you for your answers, Bozo

Read the article
JS encodeURIComponent result different from the one created by FORM

- by Marco Demaio

I thought values entered in forms are properly encoded by browsers. But this simple test shows it's not true: <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html><head> <meta http-equiv="Content-Type" content="text/html; charset=windows-1252"> <title></title> </head><body> <form id="test" action="test_get_vs_encodeuri.html" method="GET" onsubmit="alert(encodeURIComponent(this.one.value));"> <input name="one" type="text" value="Euro-€"> <input type="submit" value="SUBMIT"> </form> </body></html> When hitting submit button: encodeURICompenent encodes input value into "Euro-%E2%82%AC" while browser into the GET query writes only a simple "Euro-%80" Could somone explain? Or is encodeURIComponent doing unnecessary conversions?

Read the article
base 64 URL decode with Ruby/Rails?

- by seth.vargo

I am working with the Facebook API and Ruby on Rails and I'm trying to parse the JSON that comes back. The problem I'm running into is that Facebook base64URL encodes their data. There is no built-in base64URL decode for Ruby. For the difference between a base64 encoded and base64URL encoded, see wikipedia. How do I decode this using Ruby/Rails? Edit: Because some people have difficulty reading - base64 URL is DIFFERENT than base64

Read the article
Binding UpdateSourceTrigger=Explicit, updates source at program startup

- by GTD

I have following code: <Window x:Class="WpfApplication1.Window1" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" Title="Window1" Height="300" Width="300"> <Grid> <TextBox Text="{Binding Path=Name, Mode=OneWayToSource, UpdateSourceTrigger=Explicit, FallbackValue=default text}" KeyUp="TextBox_KeyUp" x:Name="textBox1"/> </Grid> public partial class Window1 : Window { public Window1() { InitializeComponent(); } private void TextBox_KeyUp(object sender, KeyEventArgs e) { if (e.Key == Key.Enter) { BindingExpression exp = this.textBox1.GetBindingExpression(TextBox.TextProperty); exp.UpdateSource(); } } } public class ViewModel { public string Name { set { Debug.WriteLine("setting name: " + value); } } } public partial class App : Application { protected override void OnStartup(StartupEventArgs e) { base.OnStartup(e); Window1 window = new Window1(); window.DataContext = new ViewModel(); window.Show(); } } I want to update source only when "Enter" key is pressed in textbox. This works fine. However binding updates source at program startup. How can I avoid this? Am I missing something?

Read the article
Ruby custom class to and from YAML;

- by Sanarothe

Hi. I'm having trouble deserializing a ruby class that I wrote to YAML. Where I want to be I want to be able to pass one object around as a full 'question' which includes the question text, some possible answers (For multi. choice) and the correct answer. One module (The encoder) takes input, builds a 'question' class out of it and appends it to the question pool. Another module reads a question pool and builds an array of 'question' objects. Where I am currently Sample Question Pool --- | --- !ruby/object:MultiQ a: "no" answer: "no" b: "no" c: "no" d: "no" text: "yes?" Encoder dump to YAML file. Object is a MultiQ filled up with input. (See below.) def dump(file, object) File.open(file, 'a') do |out| YAML.dump(object.to_yaml, out) end object = nil end MultiQ Class definition class MultiQ attr_accessor :text, :answer, :a, :b, :c, :d def initialize(text, answer, a, b, c, d) @text = text @answer = answer @a = a @b = b @c = c @d = d end end The decoder (I've been trying different things, so what's here wasn't my first or best guess. But I'm at a loss and the documentation doesn't really explain things thoroughly enough.) File.open( "test_set.yaml" ) do |yf| YAML.load_documents( yf ) { |item| new = YAML.object_maker( MultiQ, item) puts new } end Questions you can answer How do I achieve my goal? What methods should I use, between parsing, loading files or documents, to successfully deserialize a Ruby class? I've already looked over the YAML Rdoc, and I didn't absorb very much, so please don't just link me to it. What other methods would you suggest using? Is there a better way to store questions like this? Should I be using document db, relational db, xml? Some other format?

Read the article
How to get UTF-8 working in java webapps?

- by kosoant

I need to get UTF-8 working in my Java webapp (servlets + JSP, no framework used) to support äöå etc. for regular Finnish text and Cyrillic alphabets like ??? for special cases. My setup is the following: Development encironment: Windows XP Production encironment: Debian Database used: MySQL 5.x Users mainly use Firefox2 but also Opera 9.x, FF3, IE7 and Google Chrome are used to access the site. How to achieve this?

Read the article
c# Remove special chars from a File

- by jmpena

Hello i have a problem, im trying to open a textfile and remove all the special chars ñ Ñ ' á í etc... the file its a Layout that the clients send to me and i parse it to send the file to an AS400 server but i have to remove all special chars. THE PROBLES IS: some files with some special chars when i open it in c# it read the special chars and Two different chars and move the entire line one space to the right and then the information that has to be in that position wont be OK. i take the same file and open it in Notepad and the file is OK but when i open it in WordPad it looks like 2 chars (for just 1 especial char) Example: in the file i have: "0001 0003JUAN PEÑA33441JPENATEST" But in c# it shows "0001 0003JUAN PEï¦A33441JPENATEST" im using the encondig 1251 any help?

Read the article
.aspx character coding

- by kwek-kwek

I am having an problem. First time working with a windows server, do you know if there is any problem in character coding? My document is set to content="text/html; charset=UTF-8" but it's giving me funny words... you can check it here. This site is a pure HTML with few includes but anything else is just HTML. I can convert them to HTML entities but that is basically wasting my time. I never had this problem with any website I did except for this. Some others said "The problems seems to be that you have converted the text into utf-8 twice.". But how would Coverted it twice since dreamweaver should convert it for me but in this case it doesn't.

Read the article
Windows code pages, what are they?

- by Mike D

I'm trying to gain a basic understanding of what is meant by a Windows code page. I kind of get the feeling it's a translation between a given 8 bit value and some 'abstraction' for a given character graphic. I made the following experiment. I created a "" character literal with two versions of the letter u with an umlaut. One created using the ALT 129 (uses code page 437) value and one using the ALT 0252 (uses code page 1252) value. When I examined the literal both characters had the value 252. Is 252 the universal 8 bit abstraction for u with an umlaut? Is it the Unicode value? Aside from keyboard input are there any library routines or system calls that use code pages? For example is there a function to translate a string using a given code table (as above for the ALT 129 value)?

Read the article
Charset conversion from XXX to utf-8, command line

- by Marcin

I have a bunch of text files that are encoded in ISO-8851-2 (have some polish characters). Is there a command line tool for linux/mac that I could run from a shell script to convert this to a saner utf-8?

Read the article
How do you get the glyph for a character encoded as 'ō' from a utf-8 encoded database field usi

- by AE

I have a MySQL database table with a collation of 'utf8_general_ci' and the value in the field is: x & #299; bán yá wén (without the spaces). When this is converted (for example by StackOverflow's editor) it looks like this: xī bán yá wén where the second character looks like a lower case i with a bar over the top. In PHP, what function converts the & #299 ; entity into the ī character? I've tried using html_entity_decode($str,ENT_COMPAT,'UTF-8'), however I get characters like the following: yÄ«n wén or zhÅ•ng wén I'm pretty sure there's something I don't understand about the decoding, which is why I'm using the wrong function. Can anyone shed some light on how to get the single character glyph that's represented by the entity & #299 and similar high-number characters above 255? Many thanks, AE

Read the article
how to get default character set in windows and linux

- by Anand

How is it possible to get the default character sets in WIN XP and Linux

Read the article
display of umlauts in firefox

- by Mike D

I was doing some web searching and found some strange things involving umlauts. For example if you do a google or yahoo search for the word "nther" you are likely to find things like Günther which I take to be Gunther with an umlaut over the u. Now my question is what if anything can I do to cause these characters to be properly displayed by Firefox under windows XP? An amazing thing is that I had to introduce spaces in the G & # etc string otherwise it was properly displayed here as u with umlaut!

Read the article

< Previous Page | 117 118 119 120 121 122 123 124 125 126 127 128 | Next Page >