Search Results

Search found 37381 results on 1496 pages for 'string parsing'.

Page 140/1496 | < Previous Page | 136 137 138 139 140 141 142 143 144 145 146 147  | Next Page >

  • Python + Expat: Error on &#0; entities

    - by clacke
    I have written a small function, which uses ElementTree and xpath to extract the text contents of certain elements in an xml file: #!/usr/bin/env python2.5 import doctest from xml.etree import ElementTree from StringIO import StringIO def parse_xml_etree(sin, xpath): """ Takes as input a stream containing XML and an XPath expression. Applies the XPath expression to the XML and returns a generator yielding the text contents of each element returned. >>> parse_xml_etree( ... StringIO('<test><elem1>one</elem1><elem2>two</elem2></test>'), ... '//elem1').next() 'one' >>> parse_xml_etree( ... StringIO('<test><elem1>one</elem1><elem2>two</elem2></test>'), ... '//elem2').next() 'two' >>> parse_xml_etree( ... StringIO('<test><null>&#0;</null><elem3>three</elem3></test>'), ... '//elem2').next() 'three' """ tree = ElementTree.parse(sin) for element in tree.findall(xpath): yield element.text if __name__ == '__main__': doctest.testmod(verbose=True) The third test fails with the following exception: ExpatError: reference to invalid character number: line 1, column 13 Is the � entity illegal XML? Regardless whether it is or not, the files I want to parse contain it, and I need some way to parse them. Any suggestions for another parser than Expat, or settings for Expat, that would allow me to do that?

    Read the article

  • strange characters at beginning of file

    - by luca
    there are strange characters at the beginning of a file I'm editing (using textmate..) I don't know when they appeared, they're invisible in textmate but my script that reads the file goes crazy.. this is the first few chars in the file (as seen with od command): 0000000 177377 000120 000105 000117 000120 000114 000105 000072 the first 2 shouldn't be there I think.. maybe they were caused by some strange dropbox sync? Or something else.. but they tend to reappear (I don't yet know when..) My question: what is that 177377 and a simple way to remove it in my ruby script? thanks

    Read the article

  • how to find if groovy args contains a particular string

    - by groovynoob
    println args println args.size() println args.each{arg-> println arg} println args.class if (args.contains("Hello")) println "Found Hello" when ran give following error: [hello, somethingelse] 2 hello somethingelse [hello, somethingelse] class [Ljava.lang.String; Caught: groovy.lang.MissingMethodException: No signature of method: [Ljava.lang. String;.contains() is applicable for argument types: (java.lang.String) values: [Hello] why can I not do contains?

    Read the article

  • Parasing HTML to find specific links (Without Keywords)

    - by Brett Powell
    I posted about this sort of earlier, but I am not sure how to post back to my original question as I can only comment or answer my own question. Anyways, I need to get 4 links from a website, the latest stable build links for windows and linux, and the latest development build links for windows and linux (4 links total) within my C++ application. I can download the page (http://www.sourcemod.net/snapshots.php) with LibCURL which is already implemented in the project, but after that I am not sure. I was looking at parsers, but I can't think of how I am going to discern link from link. Obviously using a parser I could get the first link from each table, but this does not seem efficient and would only provide me with the links to windows builds. It looks like the links I need will be in the fourth in both tables, but I am just very familiar with a good way to go about this, so any help would be appreciated.

    Read the article

  • input URL, output contents of "view page source", i.e. after javascript / etc, library or command-li

    - by Ryan Berckmans
    I need a scalable, automated, method of dumping the contents of "view page source" (DOM) to a file. Programs such as wget or curl will non-interactively retrieve a set of URLs, but do not execute javascript or any of that 'fancy stuff'. My ideal solution looks like any of the following (fantasy solutions): cat urls.txt | google-chrome --quiet --no-gui \ --output-sources-directory=~/urls-source (fantasy command line, no idea if flags like these exist) or cat urls.txt | python -c "import some-library; \ ... use some-library to process urls.txt ; output sources to ~/urls-source" As a secondary concern, I also need: dump all included javascript source to file (a la firebug) dump pdf/image of page to file (print to file)

    Read the article

  • Split string into sentences based on periods

    - by rookie
    Hi all, I have written this piece of code that splits a string and stores it in a string array:- String[] sSentence = sResult.split("[a-z]\.\s+"); However, I've added the [a-z] because I wanted to deal with some of the abbreviation problem. But then my result shows up as so:- Furthermore when Everett tried to instruct them in basic mathematics they proved unresponsiv I see that I loose the pattern specified in the split function. Its okay for me to loose the period, but loosing the last letter of the word disturbs its meaning. Could some one help me with this and in addition also could someone help me with dealing with abbreviations? Like because I split the string based on periods, I do not want to loose the abbreviations. Thanks in advance

    Read the article

  • parse a special xml in python

    - by zhaojing
    I have s special xml file like below: <alarm-dictionary source="DDD" type="ProxyComponent"> <alarm code="402" severity="Alarm" name="DDM_Alarm_402"> <message>Database memory usage low threshold crossed</message> <description>dnKinds = database type = quality_of_service perceived_severity = minor probable_cause = thresholdCrossed additional_text = Database memory usage low threshold crossed </description> </alarm> ... </alarm-dictionary> I know in python, I can get the "alarm code", "severity" in tag alarm by: for alarm_tag in dom.getElementsByTagName('alarm'): if alarm_tag.hasAttribute('code'): alarmcode = str(alarm_tag.getAttribute('code')) And I can get the text in tag message like below: for messages_tag in dom.getElementsByTagName('message'): messages = "" for message_tag in messages_tag.childNodes: if message_tag.nodeType in (message_tag.TEXT_NODE, message_tag.CDATA_SECTION_NODE): messages += message_tag.data But I also want to get the value like dnkind(database), type(quality_of_service), perceived_severity(thresholdCrossed) and probable_cause(Database memory usage low threshold crossed ) in tag description. That is, I also want to parse the content in the tag in xml. Could anyone help me with this? Thanks a lot!

    Read the article

  • .Net Round-trip Types

    - by Fujiy
    I making a method that generate a unique string key for some parameters. But the same key if call with same values. I just accept primitive types, string, DateTime, Guid, and Nullable(since I append types together, I can distinguish who is int and who is int?), because I can convert all to string without lost values or precision.(for float and double a use ToString("R"), to DateTime ToString("O")). Exists a easy way to know which types I can transform in strings without conflict? And how do this transform(how I said before, float, double and datetime have specific ways) Thanks

    Read the article

  • Client side page call/scrape?

    - by Silvre
    Here is the problem: I have a web application - a frequently changing notification system - that runs on a series of local computers. The application refreshes every couple of seconds to display the new information. The computers only display info, and do not have keyboards or ANY input device. The issue is that if the connection to the server is lost (say updates are installed and a server must be rebooted), a page not found error is displayed). We must then either reboot all computers that are running this app, OR add a keyboard and refresh the browser, OR try to access each computer remotely and refresh the browser. None of these are good options and result in a lot of frustration. I cannot change the actual application OR server environment. So what I need is some way to test the call to the application, and if an error is returned or it times out, continue trying every minute or so until the connection is reestablished. My idea is to create a client-side page scraper, that makes a JS request to the application (which displays basic HTML), and can run locally on the machine, no server required. If the scrape returns the correct content, it displays it. If not it continues to request the page until the actual page content is returned. Is this possible? What is the best way to do it?

    Read the article

  • Compute hex color code for an arbitrary string

    - by user222164
    Heading Is there a way to map an arbitrary string to a HEX COLOR code. I tried to compute the HEX number for string using string hashcode. Now I need to convert this hex number to six digits which are in HEX color code range. Any suggestions ? String [] programs = {"XYZ", "TEST1", "TEST2", "TEST3", "SDFSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS"}; for(int i = 0; i < programs.length; i++) { System.out.println( programs[i] + " -- " + Integer.toHexString(programs[i].hashCode())); }

    Read the article

  • Bit convector : Get byte array from string

    - by nCdy
    When I have a string like "0xd8 0xff 0xe0" I do Text.Split(' ').Select(part => byte.Parse(part, System.Globalization.NumberStyles.HexNumber)).ToArray(); But if I got string like "0xd8ffe0" I don't know what to do ? also I'm able for recommendations how to write byte array as one string.

    Read the article

  • Writing String into a File

    - by Halo
    I'm implementing a WebScript using Alfresco's JavaScript. l'm trying to insert string to a file, but I can't do it. When I write another file's content like: file.properties.content.write(content); It works, and the file's contents are copied into my file. But I can't insert string directly, like: file.properties.content.write("Stuff like that"); it gives an exception. How can I write string into this file?

    Read the article

  • In Jeditable, how do I make it so that when I click the div to edit, the text box content has initial value that is processed?

    - by TIMEX
    When the user clicks on the div, jeditable will make a text box. However, I want the initial text to be done with function stripTags(), instead of what's on the page. The reason is that I'm using some URL techniques to turn plain text links into URLs. When the user clicks on the div, jeditable is turning them into <a href=>..</a> Is there a "beforeSubmit" option in jeditable? http://www.appelsiini.net/projects/jeditable

    Read the article

  • XPathDocument behavior with DOCTYPE declaration

    - by gliderkite
    I use XPathDocument to parse an XML file, but if there's a doctype declaration, when I initializes a new instance of the XPathDocument class passing the path of the file, that contains the XML data, to its constructor, my application tries to connect to internet (probably to verify the correctness of the XML data) and remains blocked for a long period of time. This does not occur if I delete the doctype declaration from the XML file. XmlDocument.Load method has the same behavior. How can I fix this problem? Thanks.

    Read the article

  • How to use C# to parse a glossary into database?

    - by Yaaqov
    This should be a simple one, but I'm a beginner with C#. Given a glossary list in the following format: aptitude ability, skill, gift, talent aqueous watery arguably maybe, perhaps, possibly, could be How can I parse this, and insert into a database table in the format: TABLE: Term_Glossary ================================================ Term_Name | Term_Definition | ================================================ aptitude | ability, skill, gift, talent | ------------------------------------------------ aqueous | watery | ------------------------------------------------ arguably | maybe, perhaps, possibly, could be| ================================================ Any help would be appreciated - thanks.

    Read the article

  • How to parse the file name and rename in Matlab

    - by Paul
    I am reading a .xls file and then procesing it inside and rewriting it in the end of my program. I was wondering if someone can help me to parse the dates as my input file name is like file_1_2010_03_03.csv and i want my outputfile to be newfile_2010_03_03.xls is there a way to incorporate in matlab program so i do not have to manually write the command xlswrite('newfile_2010_03_03.xls', M); everytime and change the dates as i input files with diff dates like file_2_2010_03_04.csv. Thanks

    Read the article

  • Jsoup to get data on <b> block

    - by Poh Sun
    I'm new to JSoup on Java and would like to enquire few questions. Given the HTML code of the page I would like to get is this <td width="70%" class="row1"> <b>4</b> <br />( 0 posts per day / 0.00% of total forum posts ) </td> My question here is I want to get the data 4 but the output I get is 4 ( 0 posts per day / 0.00% of total forum posts ) Here is my Java code Iterator <Element> element = totalPost.select("td[width=70%][class=row1]").iterator(); System.out.println(element.next().text()); Sorry if my question is not clear enough.

    Read the article

  • Get the subdomain from a URL

    - by jb
    Getting the subdomain from a URL sounds easy at first. http://www.domain.example Scan for the first period then return whatever came after the "http://" ... Then you remember http://super.duper.domain.example Oh. So then you think, okay, find the last period, go back a word and get everything before! Then you remember http://super.duper.domain.co.uk And you're back to square one. Anyone have any great ideas besides storing a list of all TLDs? John

    Read the article

  • Intelligent search and generation of Java code, preferrably using Python?

    - by Ipsquiggle
    Basically, I do lots of one-off code generation, large-scale refactorings, etc. etc. in Java. My tool language of choice is Python, but I'll take whatever solutions you can offer. Here is a simplified illustration of what I would like, in a pseudocode Generating an implementation for an interface search within my project: for each Interface as iName: write class(name=iName+"Impl", implements=iName) search within the body of iName: for each Method as mName: write method(name=mName, body="// TODO implement this...") Basically, the tool I'm searching for would allow me to: parse files according to their Java structure ("search for interfaces") search for words contextualized by language elements and types ("variables of type SomeClass", "doStuff() method calls on SomeClass instances") to run searches with structural context ("within the body of the current result") easily replace or generate code (with helpers to generate, as above, or functions for replacing, "rename the interface to Foo", "insert the line Blah.Blah()", etc.) The point is, I don't want to spend a lot of time writing these things, as they are usually throwaway. But sometimes I need something just a little smarter than what grep offers. It wouldn't be too hard to write up a simplistic version of this, but if I'm going to use something like this at all, I'd expect it to be robust. Any suggestions of a tool/library that will help me accomplish this?

    Read the article

  • what is the most elegant way in ruby to remove a parameter from url?

    - by dimus
    I would like to take out a parameter from url by it's name without knowing if it is the first, middle or last parameter and reassemble url again. I guess it is not that hard to write something on my own using CGI or URI, but I imagine such functionality exists already. Any suggestions? in: http://example.com/path?param1=one&param2=2&param3=something3 out: http://example.com/path?param2=2&param3=something3

    Read the article

< Previous Page | 136 137 138 139 140 141 142 143 144 145 146 147  | Next Page >