Search Results

Search found 9074 results on 363 pages for 'audio encoding'.

Page 94/363 | < Previous Page | 90 91 92 93 94 95 96 97 98 99 100 101  | Next Page >

  • Filtering Wikipedia's XML dump: error on some accents

    - by streetpc
    I'm trying to index Wikpedia dumps. My SAX parser make Article objects for the XML with only the fields I care about, then send it to my ArticleSink, which produces Lucene Documents. I want to filter special/meta pages like those prefixed with Category: or Wikipedia:, so I made an array of those prefixes and test the title of each page against this array in my ArticleSink, using article.getTitle.startsWith(prefix). In English, everything works fine, I get a Lucene index with all the pages except for the matching prefixes. In French, the prefixes with no accent also work (i.e. filter the corresponding pages), some of the accented prefixes don't work at all (like Catégorie:), and some work most of the time but fail on some pages (like Wikipédia:) but I cannot see any difference between the corresponding lines (in less). I can't really inspect all the differences in the file because of its size (5 GB), but it looks like a correct UTF-8 XML. If I take a portion of the file using grep or head, the accents are correct (even on the incriminated pages, the <title>Catégorie:something</title> is correctly displayed by grep). On the other hand, when I rectreate a wiki XML by tail/head-cutting the original file, the same page (here Catégorie:Rock par ville) gets filtered in the small file, not in the original… Any idea ? Alternatives I tried: Getting the file (commented lines were tried wihtout success): FileInputStream fis = new FileInputStream(new File(xmlFileName)); //ReaderInputStream ris = ReaderInputStream.forceEncodingInputStream(fis, "UTF-8" ); //(custom function opening the stream, reading it as UFT-8 into a Reader and returning another byte stream) //InputSource is = new InputSource( fis ); is.setEncoding("UTF-8"); parser.parse(fis, handler); Filtered prefixes: ignoredPrefix = new String[] {"Catégorie:", "Modèle:", "Wikipédia:", "Cat\uFFFDgorie:", "Mod\uFFFDle:", "Wikip\uFFFDdia:", //invalid char "Catégorie:", "Modèle:", "Wikipédia:", // UTF-8 as ISO-8859-1 "Image:", "Portail:", "Fichier:", "Aide:", "Projet:"}; // those last always work

    Read the article

  • [Ruby] Why do I have to URI.encode even safe characters for Net::HTTP requests?

    - by Matthias
    I was trying to send a GET request to Twitter (user ID replaced for privacy reasons) using Net::HTTP: url = URI.parse("http://api.twitter.com/1/friends/ids.json?user_id=12345") resp = Net::HTTP.get_response(url) this throws an exception in Net::HTTP: NoMethodError: undefined method empty?' for #<URI::HTTP:0x59f5c04> from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:1470:ininitialize' just by coincidence, I stumbled upon a similar code snippet, which used URI.encode prior to URI.parse, so I copied that and tried again: url = URI.parse(URI.encode("http://api.twitter.com/1/friends/ids.json?user_id=12345")) resp = Net::HTTP.get_response(url) now it works fine, but why? There are no reserved characters that need escaping in the URL I mentioned, so why do I have to call URI.encode for get_response to succeed?

    Read the article

  • Audio Reminders

    - by abhishek mishra
    Hi , I am developing a reminder application. A part of it is to have voive notes as reminders. On click of voice notes button i want to start the inbuilt voice recorder. How do i go ahead for it ? Also once it starts i want to retrieve the path where it gets stored so that it can be played automatically on the day the timeline is reached. Is it possible ?

    Read the article

  • Query MySQL with unicode char code.

    - by Ben
    Hi, I have been having trouble searching through a MySQL table, trying to find entries with the character (UTF-16 code 200E) in a particular column. This particular code doesn't have a glyph, so it doesn't seem to work when I try to paste it into my search term. Is there a way to specify characters as their respective code point instead for a query? Thanks, -Ben

    Read the article

  • .aspx character coding

    - by kwek-kwek
    I am having an problem. First time working with a windows server, do you know if there is any problem in character coding? My document is set to content="text/html; charset=UTF-8" but it's giving me funny words... you can check it here. This site is a pure HTML with few includes but anything else is just HTML. I can convert them to HTML entities but that is basically wasting my time. I never had this problem with any website I did except for this. Some others said "The problems seems to be that you have converted the text into utf-8 twice.". But how would Coverted it twice since dreamweaver should convert it for me but in this case it doesn't.

    Read the article

  • display of umlauts in firefox

    - by Mike D
    I was doing some web searching and found some strange things involving umlauts. For example if you do a google or yahoo search for the word "nther" you are likely to find things like G&#xfc;nther which I take to be Gunther with an umlaut over the u. Now my question is what if anything can I do to cause these characters to be properly displayed by Firefox under windows XP? An amazing thing is that I had to introduce spaces in the G & # etc string otherwise it was properly displayed here as u with umlaut!

    Read the article

  • JS encodeURIComponent result different from the one created by FORM

    - by Marco Demaio
    I thought values entered in forms are properly encoded by browsers. But this simple test shows it's not true: <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html><head> <meta http-equiv="Content-Type" content="text/html; charset=windows-1252"> <title></title> </head><body> <form id="test" action="test_get_vs_encodeuri.html" method="GET" onsubmit="alert(encodeURIComponent(this.one.value));"> <input name="one" type="text" value="Euro-€"> <input type="submit" value="SUBMIT"> </form> </body></html> When hitting submit button: encodeURICompenent encodes input value into "Euro-%E2%82%AC" while browser into the GET query writes only a simple "Euro-%80" Could somone explain? Or is encodeURIComponent doing unnecessary conversions?

    Read the article

  • Change Emacs Default Coding System

    - by Saterus
    My problem stems from Emacs inserting the coding system headers into source files containing non-ascii characters: # -*- coding: utf-8 -*- My coworkers do not like these headers being checked into our repositories. I don't want them inserted into my files because Emacs automatically detects that the file should be UTF-8 regardless so there doesn't seem to be any benefit to anyone. I would like to simply set Emacs to use UTF-8 automatically for all files, yet it seems to disagree with this idea. In an effort to fix this, I've added the following to my .emacs: (prefer-coding-system 'utf-8) (setq coding-system-for-read 'utf-8) (setq coding-system-for-write 'utf-8) This does not seem to solve my problem. Emacs still inserts the coding-system headers into my files. Anyone have any ideas? EDIT: I think this problem is specifically related to ruby-mode. I still can't turn it off though.

    Read the article

  • Serializing chinese characters with Xerces 2.6

    - by Gianluca
    I have a Xerces (2.6) DOMNode object encoded UTF-8. I use to read its TEXT element like this: CBuffer DomNodeExtended::getText( const DOMNode* node ) const { char* p = XMLString::transcode( node->getNodeValue( ) ); CBuffer xNodeText( p ); delete p; return xNodeText; } Where CBuffer is, well, just a buffer object which is lately persisted as it is in a DB. This works until in the TEXT there are just common ASCII characters. If we have i.e. chinese ones they get lost in the transcode operation. I've googled a lot seeking for a solution. It looks like with Xerces 3, the DOMWriter class should solve the problem. With Xerces 2.6 I'm trying the XMLTranscoder, but no success yet. Could anybody help?

    Read the article

  • base 64 URL decode with Ruby/Rails?

    - by seth.vargo
    I am working with the Facebook API and Ruby on Rails and I'm trying to parse the JSON that comes back. The problem I'm running into is that Facebook base64URL encodes their data. There is no built-in base64URL decode for Ruby. For the difference between a base64 encoded and base64URL encoded, see wikipedia. How do I decode this using Ruby/Rails? Edit: Because some people have difficulty reading - base64 URL is DIFFERENT than base64

    Read the article

  • Ruby custom class to and from YAML;

    - by Sanarothe
    Hi. I'm having trouble deserializing a ruby class that I wrote to YAML. Where I want to be I want to be able to pass one object around as a full 'question' which includes the question text, some possible answers (For multi. choice) and the correct answer. One module (The encoder) takes input, builds a 'question' class out of it and appends it to the question pool. Another module reads a question pool and builds an array of 'question' objects. Where I am currently Sample Question Pool --- | --- !ruby/object:MultiQ a: "no" answer: "no" b: "no" c: "no" d: "no" text: "yes?" Encoder dump to YAML file. Object is a MultiQ filled up with input. (See below.) def dump(file, object) File.open(file, 'a') do |out| YAML.dump(object.to_yaml, out) end object = nil end MultiQ Class definition class MultiQ attr_accessor :text, :answer, :a, :b, :c, :d def initialize(text, answer, a, b, c, d) @text = text @answer = answer @a = a @b = b @c = c @d = d end end The decoder (I've been trying different things, so what's here wasn't my first or best guess. But I'm at a loss and the documentation doesn't really explain things thoroughly enough.) File.open( "test_set.yaml" ) do |yf| YAML.load_documents( yf ) { |item| new = YAML.object_maker( MultiQ, item) puts new } end Questions you can answer How do I achieve my goal? What methods should I use, between parsing, loading files or documents, to successfully deserialize a Ruby class? I've already looked over the YAML Rdoc, and I didn't absorb very much, so please don't just link me to it. What other methods would you suggest using? Is there a better way to store questions like this? Should I be using document db, relational db, xml? Some other format?

    Read the article

  • c# Remove special chars from a File

    - by jmpena
    Hello i have a problem, im trying to open a textfile and remove all the special chars ñ Ñ ' á í etc... the file its a Layout that the clients send to me and i parse it to send the file to an AS400 server but i have to remove all special chars. THE PROBLES IS: some files with some special chars when i open it in c# it read the special chars and Two different chars and move the entire line one space to the right and then the information that has to be in that position wont be OK. i take the same file and open it in Notepad and the file is OK but when i open it in WordPad it looks like 2 chars (for just 1 especial char) Example: in the file i have: "0001 0003JUAN PEÑA33441JPENATEST" But in c# it shows "0001 0003JUAN PEï¦A33441JPENATEST" im using the encondig 1251 any help?

    Read the article

  • How do you get the glyph for a character encoded as '&#333;' from a utf-8 encoded database field usi

    - by AE
    I have a MySQL database table with a collation of 'utf8_general_ci' and the value in the field is: x & #299; bán yá wén (without the spaces). When this is converted (for example by StackOverflow's editor) it looks like this: xī bán yá wén where the second character looks like a lower case i with a bar over the top. In PHP, what function converts the & #299 ; entity into the ī character? I've tried using html_entity_decode($str,ENT_COMPAT,'UTF-8'), however I get characters like the following: yÄ«n wén or zhÅ•ng wén I'm pretty sure there's something I don't understand about the decoding, which is why I'm using the wrong function. Can anyone shed some light on how to get the single character glyph that's represented by the entity & #299 and similar high-number characters above 255? Many thanks, AE

    Read the article

  • how to properly display utf encoded characters on my utf-8 encoded page?

    - by Ali
    Hi guys I'm retrieving emails and some of my emails have utf encoded text. However even though my page is encoded as utf 8 - in some places when I try to out put utf text I get funny characters like : =?utf-8?B?Rlc6INqp24zYpyDYotm+INin2LMg2YXYs9qp2LHYp9uB2bkg2qnbjCDZhtmC?= =?utf-8?B?2YQg2qnYsdiz2qnYqtuSINuB24zaug==?= Whereas in other areas of the same page it displays fine. WHats going on?

    Read the article

  • Windows code pages, what are they?

    - by Mike D
    I'm trying to gain a basic understanding of what is meant by a Windows code page. I kind of get the feeling it's a translation between a given 8 bit value and some 'abstraction' for a given character graphic. I made the following experiment. I created a "" character literal with two versions of the letter u with an umlaut. One created using the ALT 129 (uses code page 437) value and one using the ALT 0252 (uses code page 1252) value. When I examined the literal both characters had the value 252. Is 252 the universal 8 bit abstraction for u with an umlaut? Is it the Unicode value? Aside from keyboard input are there any library routines or system calls that use code pages? For example is there a function to translate a string using a given code table (as above for the ALT 129 value)?

    Read the article

  • Listings in Latex with UTF-8 (or at least german umlauts)

    - by Janosch
    Trying to include a source-file into my latex document using the listings package, i got problems with german umlauts inside of the comments in the code. Using \lstset{ extendedchars=\true, inputencoding=utf8x } Umlauts in the source files (encoded in UTF-8 without BOM) are processed, but they are somehow moved to the beginning of the word they are contained in. So // die Größe muss berücksichtigt werden in the input source file, becomes // die ößGre muss übercksichtigt werden in the output file. NOTE: since i found errors in my initial setup, i heavily edited this question

    Read the article

  • convert special characters but not tags

    - by Tom
    I've got some text which needs converting to use HTML entities, but it also contains tags. Here's a sample: <p>Ofcom issued the warning to Global-owned GWR in Bristol – which is required to operate as a "contemporary and chart music and information station" – for operating outside the music </p> The (" and -) need to be converted but the paragraph tags must remain HTML. Using something like htmlentities converts everything, how can I convert everything but the tags?

    Read the article

  • Parsing a UTF-16 encoded xml file in ruby

    - by Matthew Toohey
    Hello I've been trying to parse a UTF-16 encoded xml file in Ruby (1.8.7), and I can't seem to find how to do it by searching (google and stack overflow) Here's the xml file url: http://www.abc.net.au/triplej/feeds/playout/triplejsydneyplayout.xml?_5366 Getting the xml string from Net::HTTP and passing it to REXML, then calling logger.info xmlDoc.inspect produces: <UNDEFINED> ... </> Any ideas? Cheers

    Read the article

  • Text not encoded properly.

    - by Paul Knopf
    In my masterpage, I have the following in the header. This allows me to put special characters into my website. The problem is that when javascript tries to load (on the client) special characters, I get that weird box. Example url... http://89.184.149.229/Sandportal/vinnan/trol-lna/monica-sakk--vikuskiftinum Text is below the 4 stars (mid left). Any help is greatly appreciated.

    Read the article

  • What is the proper way to URL encode Unicode characters?

    - by Josh Gibson
    I know of the non-standard %uxxxx scheme but that doesn't seem like a wise choice since the scheme has been rejected by the W3C. Some interesting examples: The heart character. If I type this into my browser: http://www.google.com/search?q=? Then copy and paste it, I see this URL http://www.google.com/search?q=%E2%99%A5 which makes it seem like Firefox (or Safari) is doing this. urllib.quote_plus(x.encode("latin-1")) '%E2%99%A5' which makes sense, except for things that can't be encoded in Latin-1, like the triple dot character. … If I type the URL http://www.google.com/search?q=… into my browser then copy and paste, I get http://www.google.com/search?q=%E2%80%A6 back. Which seems to be the result of doing urllib.quote_plus(x.encode("utf-8")) which makes sense since … can't be encoded with Latin-1. But then its not clear to me how the browser knows whether to decode with UTF-8 or Latin-1. Since this seems to be ambiguous: In [67]: u"…".encode('utf-8').decode('latin-1') Out[67]: u'\xc3\xa2\xc2\x80\xc2\xa6' works, so I don't know how the browser figures out whether to decode that with UTF-8 or Latin-1. What's the right thing to be doing with the special characters I need to deal with?

    Read the article

< Previous Page | 90 91 92 93 94 95 96 97 98 99 100 101  | Next Page >