Search Results

Search found 9074 results on 363 pages for 'audio encoding'.

Page 94/363 | < Previous Page | 90 91 92 93 94 95 96 97 98 99 100 101 | Next Page >

Filtering Wikipedia's XML dump: error on some accents

- by streetpc

I'm trying to index Wikpedia dumps. My SAX parser make Article objects for the XML with only the fields I care about, then send it to my ArticleSink, which produces Lucene Documents. I want to filter special/meta pages like those prefixed with Category: or Wikipedia:, so I made an array of those prefixes and test the title of each page against this array in my ArticleSink, using article.getTitle.startsWith(prefix). In English, everything works fine, I get a Lucene index with all the pages except for the matching prefixes. In French, the prefixes with no accent also work (i.e. filter the corresponding pages), some of the accented prefixes don't work at all (like Catégorie:), and some work most of the time but fail on some pages (like Wikipédia:) but I cannot see any difference between the corresponding lines (in less). I can't really inspect all the differences in the file because of its size (5 GB), but it looks like a correct UTF-8 XML. If I take a portion of the file using grep or head, the accents are correct (even on the incriminated pages, the <title>Catégorie:something</title> is correctly displayed by grep). On the other hand, when I rectreate a wiki XML by tail/head-cutting the original file, the same page (here Catégorie:Rock par ville) gets filtered in the small file, not in the original… Any idea ? Alternatives I tried: Getting the file (commented lines were tried wihtout success): FileInputStream fis = new FileInputStream(new File(xmlFileName)); //ReaderInputStream ris = ReaderInputStream.forceEncodingInputStream(fis, "UTF-8" ); //(custom function opening the stream, reading it as UFT-8 into a Reader and returning another byte stream) //InputSource is = new InputSource( fis ); is.setEncoding("UTF-8"); parser.parse(fis, handler); Filtered prefixes: ignoredPrefix = new String[] {"Catégorie:", "Modèle:", "Wikipédia:", "Cat\uFFFDgorie:", "Mod\uFFFDle:", "Wikip\uFFFDdia:", //invalid char "CatÃ©gorie:", "ModÃ¨le:", "WikipÃ©dia:", // UTF-8 as ISO-8859-1 "Image:", "Portail:", "Fichier:", "Aide:", "Projet:"}; // those last always work

Read the article
[Ruby] Why do I have to URI.encode even safe characters for Net::HTTP requests?

- by Matthias

I was trying to send a GET request to Twitter (user ID replaced for privacy reasons) using Net::HTTP: url = URI.parse("http://api.twitter.com/1/friends/ids.json?user_id=12345") resp = Net::HTTP.get_response(url) this throws an exception in Net::HTTP: NoMethodError: undefined method empty?' for #<URI::HTTP:0x59f5c04> from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:1470:ininitialize' just by coincidence, I stumbled upon a similar code snippet, which used URI.encode prior to URI.parse, so I copied that and tried again: url = URI.parse(URI.encode("http://api.twitter.com/1/friends/ids.json?user_id=12345")) resp = Net::HTTP.get_response(url) now it works fine, but why? There are no reserved characters that need escaping in the URL I mentioned, so why do I have to call URI.encode for get_response to succeed?

Read the article
Audio Reminders

- by abhishek mishra

Hi , I am developing a reminder application. A part of it is to have voive notes as reminders. On click of voice notes button i want to start the inbuilt voice recorder. How do i go ahead for it ? Also once it starts i want to retrieve the path where it gets stored so that it can be played automatically on the day the timeline is reached. Is it possible ?

Read the article
Query MySQL with unicode char code.

- by Ben

Hi, I have been having trouble searching through a MySQL table, trying to find entries with the character (UTF-16 code 200E) in a particular column. This particular code doesn't have a glyph, so it doesn't seem to work when I try to paste it into my search term. Is there a way to specify characters as their respective code point instead for a query? Thanks, -Ben

Read the article
.aspx character coding

- by kwek-kwek

I am having an problem. First time working with a windows server, do you know if there is any problem in character coding? My document is set to content="text/html; charset=UTF-8" but it's giving me funny words... you can check it here. This site is a pure HTML with few includes but anything else is just HTML. I can convert them to HTML entities but that is basically wasting my time. I never had this problem with any website I did except for this. Some others said "The problems seems to be that you have converted the text into utf-8 twice.". But how would Coverted it twice since dreamweaver should convert it for me but in this case it doesn't.

Read the article
JS encodeURIComponent result different from the one created by FORM

- by Marco Demaio

I thought values entered in forms are properly encoded by browsers. But this simple test shows it's not true: <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html><head> <meta http-equiv="Content-Type" content="text/html; charset=windows-1252"> <title></title> </head><body> <form id="test" action="test_get_vs_encodeuri.html" method="GET" onsubmit="alert(encodeURIComponent(this.one.value));"> <input name="one" type="text" value="Euro-€"> <input type="submit" value="SUBMIT"> </form> </body></html> When hitting submit button: encodeURICompenent encodes input value into "Euro-%E2%82%AC" while browser into the GET query writes only a simple "Euro-%80" Could somone explain? Or is encodeURIComponent doing unnecessary conversions?

Read the article
display of umlauts in firefox

- by Mike D

I was doing some web searching and found some strange things involving umlauts. For example if you do a google or yahoo search for the word "nther" you are likely to find things like Günther which I take to be Gunther with an umlaut over the u. Now my question is what if anything can I do to cause these characters to be properly displayed by Firefox under windows XP? An amazing thing is that I had to introduce spaces in the G & # etc string otherwise it was properly displayed here as u with umlaut!

Read the article
Charset conversion from XXX to utf-8, command line

- by Marcin

I have a bunch of text files that are encoded in ISO-8851-2 (have some polish characters). Is there a command line tool for linux/mac that I could run from a shell script to convert this to a saner utf-8?

Read the article
Change Emacs Default Coding System

- by Saterus

My problem stems from Emacs inserting the coding system headers into source files containing non-ascii characters: # -*- coding: utf-8 -*- My coworkers do not like these headers being checked into our repositories. I don't want them inserted into my files because Emacs automatically detects that the file should be UTF-8 regardless so there doesn't seem to be any benefit to anyone. I would like to simply set Emacs to use UTF-8 automatically for all files, yet it seems to disagree with this idea. In an effort to fix this, I've added the following to my .emacs: (prefer-coding-system 'utf-8) (setq coding-system-for-read 'utf-8) (setq coding-system-for-write 'utf-8) This does not seem to solve my problem. Emacs still inserts the coding-system headers into my files. Anyone have any ideas? EDIT: I think this problem is specifically related to ruby-mode. I still can't turn it off though.

Read the article
Serializing chinese characters with Xerces 2.6

- by Gianluca

I have a Xerces (2.6) DOMNode object encoded UTF-8. I use to read its TEXT element like this: CBuffer DomNodeExtended::getText( const DOMNode* node ) const { char* p = XMLString::transcode( node->getNodeValue( ) ); CBuffer xNodeText( p ); delete p; return xNodeText; } Where CBuffer is, well, just a buffer object which is lately persisted as it is in a DB. This works until in the TEXT there are just common ASCII characters. If we have i.e. chinese ones they get lost in the transcode operation. I've googled a lot seeking for a solution. It looks like with Xerces 3, the DOMWriter class should solve the problem. With Xerces 2.6 I'm trying the XMLTranscoder, but no success yet. Could anybody help?

Read the article
base 64 URL decode with Ruby/Rails?

- by seth.vargo

I am working with the Facebook API and Ruby on Rails and I'm trying to parse the JSON that comes back. The problem I'm running into is that Facebook base64URL encodes their data. There is no built-in base64URL decode for Ruby. For the difference between a base64 encoded and base64URL encoded, see wikipedia. How do I decode this using Ruby/Rails? Edit: Because some people have difficulty reading - base64 URL is DIFFERENT than base64

Read the article
c# Remove special chars from a File

- by jmpena

Hello i have a problem, im trying to open a textfile and remove all the special chars ñ Ñ ' á í etc... the file its a Layout that the clients send to me and i parse it to send the file to an AS400 server but i have to remove all special chars. THE PROBLES IS: some files with some special chars when i open it in c# it read the special chars and Two different chars and move the entire line one space to the right and then the information that has to be in that position wont be OK. i take the same file and open it in Notepad and the file is OK but when i open it in WordPad it looks like 2 chars (for just 1 especial char) Example: in the file i have: "0001 0003JUAN PEÑA33441JPENATEST" But in c# it shows "0001 0003JUAN PEï¦A33441JPENATEST" im using the encondig 1251 any help?

Read the article
how to get default character set in windows and linux

- by Anand

How is it possible to get the default character sets in WIN XP and Linux

Read the article
How do you get the glyph for a character encoded as 'ō' from a utf-8 encoded database field usi

- by AE

I have a MySQL database table with a collation of 'utf8_general_ci' and the value in the field is: x & #299; bán yá wén (without the spaces). When this is converted (for example by StackOverflow's editor) it looks like this: xī bán yá wén where the second character looks like a lower case i with a bar over the top. In PHP, what function converts the & #299 ; entity into the ī character? I've tried using html_entity_decode($str,ENT_COMPAT,'UTF-8'), however I get characters like the following: yÄ«n wén or zhÅ•ng wén I'm pretty sure there's something I don't understand about the decoding, which is why I'm using the wrong function. Can anyone shed some light on how to get the single character glyph that's represented by the entity & #299 and similar high-number characters above 255? Many thanks, AE

Read the article
Incorporating ISO 8859-1 Symbols / foreign langauges into a WinForms application

- by joslinm

Hi, I have a function that finds any ISO 8859-1 symbol within a given string, and tries converting it to its proper meaning. However, I get question marks instead where I'd like actual values like : ÿ é æ etc. Can you please point me in the right direction on how to properly handle foreign/unique symbols?

Read the article
Ruby custom class to and from YAML;

- by Sanarothe

Hi. I'm having trouble deserializing a ruby class that I wrote to YAML. Where I want to be I want to be able to pass one object around as a full 'question' which includes the question text, some possible answers (For multi. choice) and the correct answer. One module (The encoder) takes input, builds a 'question' class out of it and appends it to the question pool. Another module reads a question pool and builds an array of 'question' objects. Where I am currently Sample Question Pool --- | --- !ruby/object:MultiQ a: "no" answer: "no" b: "no" c: "no" d: "no" text: "yes?" Encoder dump to YAML file. Object is a MultiQ filled up with input. (See below.) def dump(file, object) File.open(file, 'a') do |out| YAML.dump(object.to_yaml, out) end object = nil end MultiQ Class definition class MultiQ attr_accessor :text, :answer, :a, :b, :c, :d def initialize(text, answer, a, b, c, d) @text = text @answer = answer @a = a @b = b @c = c @d = d end end The decoder (I've been trying different things, so what's here wasn't my first or best guess. But I'm at a loss and the documentation doesn't really explain things thoroughly enough.) File.open( "test_set.yaml" ) do |yf| YAML.load_documents( yf ) { |item| new = YAML.object_maker( MultiQ, item) puts new } end Questions you can answer How do I achieve my goal? What methods should I use, between parsing, loading files or documents, to successfully deserialize a Ruby class? I've already looked over the YAML Rdoc, and I didn't absorb very much, so please don't just link me to it. What other methods would you suggest using? Is there a better way to store questions like this? Should I be using document db, relational db, xml? Some other format?

Read the article
how to properly display utf encoded characters on my utf-8 encoded page?

- by Ali

Hi guys I'm retrieving emails and some of my emails have utf encoded text. However even though my page is encoded as utf 8 - in some places when I try to out put utf text I get funny characters like : =?utf-8?B?Rlc6INqp24zYpyDYotm+INin2LMg2YXYs9qp2LHYp9uB2bkg2qnbjCDZhtmC?= =?utf-8?B?2YQg2qnYsdiz2qnYqtuSINuB24zaug==?= Whereas in other areas of the same page it displays fine. WHats going on?

Read the article
How do I convert character encodings with Javascript? JQuery.

- by heffaklump

Hi, Ive got this in an XML file that i parse with JQuery. <title>Lång</title> I'm using .text() for pulling out the text, but it's wrong encoded. How do I get it encoded to proper text? I want 'Lång' out of it.

Read the article
Windows code pages, what are they?

- by Mike D

I'm trying to gain a basic understanding of what is meant by a Windows code page. I kind of get the feeling it's a translation between a given 8 bit value and some 'abstraction' for a given character graphic. I made the following experiment. I created a "" character literal with two versions of the letter u with an umlaut. One created using the ALT 129 (uses code page 437) value and one using the ALT 0252 (uses code page 1252) value. When I examined the literal both characters had the value 252. Is 252 the universal 8 bit abstraction for u with an umlaut? Is it the Unicode value? Aside from keyboard input are there any library routines or system calls that use code pages? For example is there a function to translate a string using a given code table (as above for the ALT 129 value)?

Read the article
What is \u0026#39;n and how do I decode it?

- by The Rook

So I have a string that is in another language, most of it looks great, but parts of it is encoded incorrectly. How do I convert the literal string \u0026#39;n into its unicode(?) equivalent in PHP?

Read the article
convert special characters but not tags

- by Tom

I've got some text which needs converting to use HTML entities, but it also contains tags. Here's a sample: <p>Ofcom issued the warning to Global-owned GWR in Bristol – which is required to operate as a "contemporary and chart music and information station" – for operating outside the music </p> The (" and -) need to be converted but the paragraph tags must remain HTML. Using something like htmlentities converts everything, how can I convert everything but the tags?

Read the article
Listings in Latex with UTF-8 (or at least german umlauts)

- by Janosch

Trying to include a source-file into my latex document using the listings package, i got problems with german umlauts inside of the comments in the code. Using \lstset{ extendedchars=\true, inputencoding=utf8x } Umlauts in the source files (encoded in UTF-8 without BOM) are processed, but they are somehow moved to the beginning of the word they are contained in. So // die Größe muss berücksichtigt werden in the input source file, becomes // die ößGre muss übercksichtigt werden in the output file. NOTE: since i found errors in my initial setup, i heavily edited this question

Read the article
Parsing a UTF-16 encoded xml file in ruby

- by Matthew Toohey

Hello I've been trying to parse a UTF-16 encoded xml file in Ruby (1.8.7), and I can't seem to find how to do it by searching (google and stack overflow) Here's the xml file url: http://www.abc.net.au/triplej/feeds/playout/triplejsydneyplayout.xml?_5366 Getting the xml string from Net::HTTP and passing it to REXML, then calling logger.info xmlDoc.inspect produces: <UNDEFINED> ... </> Any ideas? Cheers

Read the article
What is the proper way to URL encode Unicode characters?

- by Josh Gibson

I know of the non-standard %uxxxx scheme but that doesn't seem like a wise choice since the scheme has been rejected by the W3C. Some interesting examples: The heart character. If I type this into my browser: http://www.google.com/search?q=? Then copy and paste it, I see this URL http://www.google.com/search?q=%E2%99%A5 which makes it seem like Firefox (or Safari) is doing this. urllib.quote_plus(x.encode("latin-1")) '%E2%99%A5' which makes sense, except for things that can't be encoded in Latin-1, like the triple dot character. … If I type the URL http://www.google.com/search?q=… into my browser then copy and paste, I get http://www.google.com/search?q=%E2%80%A6 back. Which seems to be the result of doing urllib.quote_plus(x.encode("utf-8")) which makes sense since … can't be encoded with Latin-1. But then its not clear to me how the browser knows whether to decode with UTF-8 or Latin-1. Since this seems to be ambiguous: In [67]: u"…".encode('utf-8').decode('latin-1') Out[67]: u'\xc3\xa2\xc2\x80\xc2\xa6' works, so I don't know how the browser figures out whether to decode that with UTF-8 or Latin-1. What's the right thing to be doing with the special characters I need to deal with?

Read the article
Text not encoded properly.

- by Paul Knopf

In my masterpage, I have the following in the header. This allows me to put special characters into my website. The problem is that when javascript tries to load (on the client) special characters, I get that weird box. Example url... http://89.184.149.229/Sandportal/vinnan/trol-lna/monica-sakk--vikuskiftinum Text is below the 4 stars (mid left). Any help is greatly appreciated.

Read the article

< Previous Page | 90 91 92 93 94 95 96 97 98 99 100 101 | Next Page >