Search Results

Search found 15004 results on 601 pages for 'date parsing'.

Page 99/601 | < Previous Page | 95 96 97 98 99 100 101 102 103 104 105 106  | Next Page >

  • scraping text from multiple html files into a single csv file

    - by Lulu
    I have just over 1500 html pages (1.html to 1500.html). I have written a code using Beautiful Soup that extracts most of the data I need but "misses" out some of the data within the table. My Input: e.g file 1500.html My Code: #!/usr/bin/env python import glob import codecs from BeautifulSoup import BeautifulSoup with codecs.open('dump2.csv', "w", encoding="utf-8") as csvfile: for file in glob.glob('*html*'): print 'Processing', file soup = BeautifulSoup(open(file).read()) rows = soup.findAll('tr') for tr in rows: cols = tr.findAll('td') #print >> csvfile,"#".join(col.string for col in cols) #print >> csvfile,"#".join(td.find(text=True)) for col in cols: print >> csvfile, col.string print >> csvfile, "===" print >> csvfile, "***" Output: One CSV file, with 1500 lines of text and columns of data. For some reason my code does not pull out all the required data but "misses" some data, e.g the Address1 and Address 2 data at the start of the table do not come out. I modified the code to put in * and === separators, I then use perl to put into a clean csv file, unfortunately I'm not sure how to work my code to get all the data I'm looking for!

    Read the article

  • Scrape HTML tables from a given URL into CSV

    - by dreeves
    I seek a tool that can be run on the command line like so: tablescrape 'http://someURL.foo.com' [n] If n is not specified and there's more than one HTML table on the page, it should summarize them (header row, total number of rows) in a numbered list. If n is specified or if there's only one table, it should parse the table and spit it to stdout as CSV or TSV. Potential additional features: To be really fancy you could parse a table within a table, but for my purposes -- fetching data from wikipedia pages and the like -- that's overkill. The Perl module HTML::TableExtract can do this and may be good place to start for writing the tool I have in mind. An option to asciify any unicode. An option to apply an arbitrary regex substitution for fixing weirdnesses in the parsed table. Related questions: http://stackoverflow.com/questions/259091/how-can-i-scrape-an-html-table-to-csv http://stackoverflow.com/questions/1403087/how-can-i-convert-an-html-table-to-csv http://stackoverflow.com/questions/2861/options-for-html-scraping

    Read the article

  • LIbrary issue: How do I set up QtWebKit to parse HTML?

    - by user560106
    Nick Presta showed that you can parse HTML with qt here: Library Recommendation: C++ HTML Parser However, when I attempt to build this, I get an access violation on the "QWebFrame* frame = page.mainFrame();" line. What am I doing wrong? #include <QtWebKit\QWebElement> #include <QtWebKit\QWebView> #include <QtWebKit\QWebFrame> #include <QtWebKit\QWebPage> #include <iostream> int main() { QWebPage page; QWebFrame* frame = page.mainFrame(); frame->setHtml( "<html><head></head><body></body></html>" ); QWebElement document = frame->documentElement(); return 0; }

    Read the article

  • I18N: Does Time always come after Date?

    - by Ian Boyd
    Does the time always come after the date, with a space in between, in every culture on earth? i see that Microsoft FCL assumes that it does: public string get_FullDateTimePattern() { if (this.fullDateTimePattern == null) { this.fullDateTimePattern = this.LongDatePattern + " " + this.LongTimePattern; } return this.fullDateTimePattern; } Is this an assumption i can make in every development language for every culture?

    Read the article

  • Extracting URIs from RDF web page in Java using Jena Library

    - by Prannoy Mittal
    I have written following code for extratcting URIs from a web page with content type application/rdf-xml for Linked Data application. public static void test(String url) { try { Model read = ModelFactory.createDefaultModel().read(url); System.out.println("to go"); StmtIterator si; si = read.listStatements(); System.out.println("to go"); while(si.hasNext()) { Statement s=si.nextStatement(); Resource r=s.getSubject(); Property p=s.getPredicate(); RDFNode o=s.getObject(); System.out.println(r.getURI()); System.out.println(p.getURI()); System.out.println(o.asResource().getURI()); } } catch(JenaException | NoSuchElementException c) { } } But above code is not extracting all URIs. It provides only few of the URIs. Please guide me where i Went wrong?? hey Rafeel For Eq: for XML File : <?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:ex="http://example.org/stuff/1.0/"> <rdf:Description rdf:about="http://www.w3.org/TR/rdf-syntax-grammar" dc:title="RDF/XML Syntax Specification (Revised)"> <ex:editor> <rdf:Description ex:fullName="Dave Beckett"> <ex:homePage rdf:resource="http://purl.org/net/dajobe/" /> </rdf:Description> </ex:editor> </rdf:Description> </rdf:RDF> The output is : Subject URI is http://www.w3.org/TR/rdf-syntax-grammar Predicate URI is http://example.org/stuff/1.0/editor Object URI is null Subject URI is http://www.w3.org/TR/rdf-syntax-grammar Predicate URI is http://purl.org/dc/elements/1.1/title Website is read

    Read the article

  • MySQL UTC Date format

    - by Btibert3
    Hi everyone, I am pulling data from Twitter's api and the return date is UTC in the following form: Sat Jan 24 22:14:29 +0000 2009 Can MySQL handle this format specifically or do I need to transform it? I am pulling the data using Python. Thanks, Brock

    Read the article

  • RSpec and stubbing parameters for a named scope

    - by Andy Waite
    I'm try to write a spec for a named scope which is date dependent. The spec: it "should return 6 months of documents" do Date.stub!(:today).and_return(Date.new(2005, 03, 03)) doc_1 = Factory.create(:document, :date => '2005-01-01') Document.past_six_months.should == [doc_1] end The named scope in the Document model: named_scope :past_six_months, :conditions => ['date > ? AND date < ?', Date.today - 6.months, Date.today] The spec fails with an empty array, and the query in test.log shows why: SELECT * FROM "documents" WHERE (date > '2009-11-11' AND date < '2010-05-11') i.e. it appears to be ignoring my stubbed Date method. However, if I use a class method instead of a named scope then it passes: def self.past_six_months find(:all, :conditions => ['date > ? AND date < ?', Date.today - 6.months, Date.today]) end I would rather use the named scope approach but I don't understand why it isn't working.

    Read the article

  • Haskell Parsec Numeration

    - by Martin
    I'm using Text.ParserCombinators.Parsec and Text.XHtml to parse an input like this: - First type A\n -- First type B\n - Second type A\n -- First type B\n --Second type B\n And my output should be: <h11 First type A\n</h1 <h21.1 First type B\n</h2 <h12 Second type A\n</h2 <h22.1 First type B\n</h2 <h22.2 Second type B\n</h2 I have come to this part, but I cannot get any further: title1= do{ ;(count 1 (char '-')) ;s <- many1 anyChar newline ;return (h1 << s) } title2= do{ ;(count 2 (char '--')) ;s <- many1 anyChar newline ;return (h1 << s) } text=do { ;many (choice [try(title1),try(title2)]) } main :: IO () main = do t putStr "Error: " print err Right x - putStrLn $ prettyHtml x This is ok, but it does not include the numbering. Any ideas? Thanks!

    Read the article

  • jquery selector logical AND?

    - by taber
    In jQuery I'm trying to select only mount nodes where a and b's text values are 64 and "test" accordingly. I'd also like to fallback to 32 if no 64 and "test" exist. What I'm seeing with the code below though, is that the 32 mount is being returned instead of the 64. The XML: <thingses <thing <a32</a <-- note, a here is 32 and not 64 -- <other...</other <mountsample 1</mount <btest</b </thing <thing <a64</a <other...</other <mountsample 2</mount <btest</b </thing <thing <a64</a <other...</other <mountsample 3</mount <bunrelated</b </thing <thing <a128</a <other...</other <mountsample 4</mount <bunrelated</b </thing </thingses And unfortunately I don't have control over the XML as it comes from somewhere else. What I'm doing now is: var ret_val = ''; $data.find('thingses thing').each(function(i, node) { var $node = $(node), found_node = $node.find('b:first:is(test), a:first:is(64)').end().find('mount:first').text(); if(found_node) { ret_val = found_node; return; } found_node = $node.find('b:first:is(test), a:first:is(32)').end().find('mount:first').text(); if(found_node) { ret_val = found_node; return; } ret_val = 'not found'; }); // expected result is "sample 2", but if sample 2's parent "thing" was missing, the result would be "sample 1" alert(ret_val); For my ":is" selector I'm using: if(jQuery){ jQuery.expr[":"].is = function(obj, index, meta, stack){ return (obj.textContent || obj.innerText || $(obj).text() || "").toLowerCase() == meta[3].toLowerCase(); }; } There has to be a better way than how I'm doing it. I wish I could replace the "," with "AND" or something. :) Any help would be much appreciated. thanks!

    Read the article

  • JavaScript parser in JavaScript

    - by emk
    I need to add some lightweight syntactic sugar to JavaScript source code, and process it using a JavaScript-based build system. Are there any open source JavaScript parsers written in JavaScript? And are they reasonably fast when run on top of V8 or a similar high-performance JavaScript implementation? Thank you for any pointers you can provide!

    Read the article

  • What is the difference between an Abstract Syntax Tree and a Concrete Syntax Tree?

    - by Jason Baker
    I've been reading a bit about how interpreters/compilers work, and one area where I'm getting confused is the difference between an AST and a CST. My understanding is that the parser makes a CST, hands it to the semantic analyzer which turns it into an AST. However, my understanding is that the semantic analyzer simply ensures that rules are followed. I don't really understand why it would actually make any changes to make it abstract rather than concrete. Is there something that I'm missing about the semantic analyzer, or is the difference between an AST and CST somewhat artificial?

    Read the article

  • How do I get 3 lines of text from a paragraph

    - by Keltex
    I'm trying to create an "snippet" from a paragraph. I have a long paragraph of text with a word hilighted in the middle. I want to get the line containing the word before that line and the line after that line. I have the following piece of information: The text (in a string) The lines are deliminated by a NEWLINE character \n I have the index into the string of the text I want to hilight A couple other criteria: If my word falls on first line of the paragraph, it should show the 1st 3 lines If my word falls on the last line of the paragraph, it should show the last 3 lines Should show the entire paragraph in the degenative cases (the paragraph only has 1 or 2 lines) Here's an example: This is the 1st line of CAT text in the paragraph This is the 2nd line of BIRD text in the paragraph This is the 3rd line of MOUSE text in the paragraph This is the 4th line of DOG text in the paragraph This is the 5th line of RABBIT text in the paragraph Example, if my index points to BIRD, it should show lines 1, 2, & 3 as one complete string like this: This is the 1st line of CAT text in the paragraph This is the 2nd line of BIRD text in the paragraph This is the 3rd line of MOUSE text in the paragraph If my index points to DOG, it should show lines 3, 4, & 5 as one complete string like this: This is the 3rd line of MOUSE text in the paragraph This is the 4th line of DOG text in the paragraph This is the 5th line of RABBIT text in the paragraph etc. Anybody want to help tackle this?

    Read the article

  • Extracting a URL in Python

    - by Kyle Hayes
    In regards to: http://stackoverflow.com/questions/720113/find-hyperlinks-in-text-using-python-twitter-related How can I extract just the url so I can put it into a list/array? Edit Let me clarify, I don't want to parse the URL into pieces. I want to extract the URL from the text of the string to put it into an array. Thanks!

    Read the article

  • Boolean logic parser for SQL

    - by d03boy
    This is going to sound crazy but does anyone have techniques that would allow me to parse boolean logic strings in Sql Server 2005 without extraordinary/rediculous effort? Here is an example: (SOMEVAR=4 OR SOMEVAR=5) AND (NOT OTHERVAR=Y) I feel like recursion would help a lot if that were possible in Sql but I'm not really sure how to go about that sort of thing. If not, maybe there's a way to attach an external system to do the recursion for me? Don't worry, I'm not getting my hopes up.

    Read the article

  • How to parse AMF data in Ruby?

    - by Matchu
    So I see that there are a few Rails plugins for serving AMF. However, is there a library that I can use in a Ruby environment to act as an AMF client: to read AMF data, and deserialize it into a Ruby object? If not, how could I best go about using tools built in other languages? I suppose I could write something in Python or Java or whatever, and call it from Ruby directly via backticks... but I'd first like to ensure that there isn't really any better option. Thanks!

    Read the article

  • Simplest way to add HTML as a String to a new Nokogiri HTML document body?

    - by viatropos
    I have a bunch of content from the body of one HTML file. How do I put that into the body of a new blank-slate HTML document using Nokogiri? Something like this, but with Nokogiri: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Default Title</title> </head> <body class='default-class'> <%= yield :body %> </body> </html>

    Read the article

  • Extract / Parse Tags from Mixed Content String

    - by Andreas
    Hello, i want to parse Tags from a mixed Content String. The string goes like this: "<PERSON>yasir arafat</PERSON> , the president of the <LOCATION>palestinian authority</LOCATION> , on the defensive , mr . sharon believes , a government official" I only want to use jaxp. Got anybody an idea for this. May an easy way with Expressions. But i need the Element names as well though. Best Regards Andreas

    Read the article

  • Element Based XML Parising

    - by demos
    I have an XML document which reads like this: <xml> <web:Web> <web:Total>4000</web:Total> <web:Offset>0</web:Offset> </web:Web> </xml> my question is how do I access them using a library like BeautifulSoup in python? xmlDom.web["Web"].Total ? does not work?

    Read the article

  • Java .split() Method To Split XML Parameters

    - by Buzz Lightyear
    I have this line from an XML document: <?xml version="1.0" encoding="UTF-8"?> <svg xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg" contentScriptType="text/ecmascript" width="1024" zoomAndPan="magnify" contentStyleType="text/css" viewBox="0 0 1024 768" height="768" preserveAspectRatio="xMidYMid meet" version="1.0"> I want to be able to split it up, using the split method. For example i want to save each parameter into a String array. So i'd like: contentScriptType="text/ecmascript" width="1024" zoomAndPan="magnify" contentStyleType="text/css" viewBox="0 0 1024 768" height="768" etc etc to be saved into a string array, is there anyway to do this using the split method, or can anybody suggest an easier, more efficient way to do this?

    Read the article

  • abap date formatting

    - by ben
    Hi there, I'm new to abap and just writing my first application. Now I got a date of the type SYDATUM and wondering how to format it in a format like m/d/y. I've found some snippets on the web, but they were not really helpful. -thanks yor your help.

    Read the article

  • Custom whiteSpace using Haskell Parsec

    - by fryguybob
    I would like to use Parsec's makeTokenParser to build my parser, but I want to use my own definition of whiteSpace. Doing the following replaces whiteSpace with my definition, but all the lexeme parsers still use the old definition (e.g. P.identifier lexer will use the old whiteSpace). ... lexer :: P.TokenParser () lexer = l { P.whiteSpace = myWhiteSpace } where l = P.makeTokenParser myLanguageDef ... Looking at the code for makeTokenParser I think I understand why it works this way. I want to know if there are any workarounds to avoid completely duplicating the code for makeTokenParser?

    Read the article

  • XML parse node value as string

    - by bharathi
    My xml file is look like this . I want to get the value node text content as like this . <property regex=".*" xpath=".*"> <value> 127.0.0.1 </value> <property regex=".*" xpath=".*"> <value> val <![CDATA[ <Valve className="org.tomcat.AccessLogValve" exclude="PASSWORD,pwd,pWord,ticket" enabled="true" serviceName="zohocrm" logDir="../logs" fileName="access" format="URI,&quot;PARAM&quot;,&quot;REFERRER&quot;,TIME_TAKEN,BYTES_OUT,STATUS,TIMESTAMP,METHOD,SESSION_ID,REMOTE_IP,&quot;INTERNAL_IP&quot;,&quot;USER_AGENT&quot;,PROTOCOL,SERVER_NAME,SERVER_PORT,BYTES_IN,ZUID,TICKET_DIGEST,THREAD_ID,REQ_ID"/> ]]> test </value> </property> I want to get text as order they specified in a file . Here is my java code . Document doc = parseDocument("properties.xml"); NodeList properties = doc.getElementsByTagName("property"); for( int i = 0 , len = properties.getLength() ; i < len ; i++) { Element property = (Element)properties.item(i); //How can i proceed further . } Output Expected : Node 1 : 127.0.0.1 Node 2 : val <Valve className="org.tomcat.AccessLogValve" exclude="PASSWORD,pwd,pWord,ticket" enabled="true" serviceName="zohocrm" logDir="../logs" fileName="access" format="URI,&quot;PARAM&quot;,&quot;REFERRER&quot;,TIME_TAKEN,BYTES_OUT,STATUS,TIMESTAMP,METHOD,SESSION_ID,REMOTE_IP,&quot;INTERNAL_IP&quot;,&quot;USER_AGENT&quot;,PROTOCOL,SERVER_NAME,SERVER_PORT,BYTES_IN,ZUID,TICKET_DIGEST,THREAD_ID,REQ_ID"/> test Please suggest your views .

    Read the article

  • Group multiple media queries formed as output of LESS css

    - by Goje87
    I was planning to use LESS css in my project (PHP). I am planning to use its nested @media query feature. I find that it fails to group the multiple media queries in the output css it generates. For example: // LESS .header { @media all and (min-width: 240px) and (max-width: 319px) { font-size: 12px; } @media all and (min-width: 320px) and (max-width: 479px) { font-size: 16px; font-weight: bold; } } .body { @media all and (min-width: 240px) and (max-width: 319px) { font-size: 10px; } @media all and (min-width: 320px) and (max-width: 479px) { font-size: 12px; } } // output CSS @media all and (min-width: 240px) and (max-width: 319px) { .header { font-size: 12px; } } @media all and (min-width: 320px) and (max-width: 479px) { .header { font-size: 16px; font-weight: bold; } } @media all and (min-width: 240px) and (max-width: 319px) { .body { font-size: 10px; } } @media all and (min-width: 320px) and (max-width: 479px) { .body { font-size: 12px; } } My expected output is (@media queries grouped) @media all and (min-width: 240px) and (max-width: 319px) { .header { font-size: 12px; } .body { font-size: 10px; } } @media all and (min-width: 320px) and (max-width: 479px) { .header { font-size: 16px; font-weight: bold; } .body { font-size: 12px; } } I would like to know if it can be done in LESS it self or is there any simple CSS parser I can use to manipulate the output CSS to group the @media queries.

    Read the article

< Previous Page | 95 96 97 98 99 100 101 102 103 104 105 106  | Next Page >