Search Results

Search found 3176 results on 128 pages for 'parsing'.

Page 44/128 | < Previous Page | 40 41 42 43 44 45 46 47 48 49 50 51  | Next Page >

  • Any Java library for address extraction from emails?

    - by Hans Klock
    I'm looking for an Java open-source library which is able to extract address information from a (German) email (signature). The library should find name street city, city code/postal code email tel/fax address-parser.com is an commercial product, but a free (albeit simple) library would be great. stackoverflow.com/questions/16413/parse-usable-street-address-city-state-zip-from-a-string is asking for something similar, but my problem is broader because the address information is hidden in a complete email. And there isn't a solution either... Any ideas?

    Read the article

  • making arrays from tab-delimited text file column

    - by absolutenewbie
    I was wondering if anyone could help a desperate newbie with perl with the following question. I've been trying all day but with my perl book at work, I can't seem to anything relevant in google...or maybe am genuinely stupid with this. I have a file that looks something like the following: Bob April Bob April Bob March Mary August Robin December Robin April The output file I'm after is: Bob April April March Mary August Robin December April So that it lists each month in the order that it appears for each person. I tried making it into a hash but of course it wouldn't let me have duplicates so I thought I would like to have arrays for each name (in this example, Bob, Mary and Robin). I'm afraid to upload the code I've been trying to tweak because I know it'll be horribly wrong. I think I need to define(?) the array. Would this be correct? Any help would be greatly appreciated and I promise I will be studying more about perl in the meantime. Thank you for your time, patience and help. #!/usr/bin/perl -w while (<>) { chomp; if (defined $old_name) { $name=$1; $month=$2; if ($name eq $old_name) { $array{$month}++; } else { print "$old_name"; foreach (@array) { push (@array, $month); print "\t@array"; } print "\n"; @array=(); $array{$month}++; } } else { $name=$1; $month=$2; $array{month}++; } $old_name=$name; } print "$old_name"; foreach (@array) { push (@array, $month); print "\t@array"; } print "\n";

    Read the article

  • Objective-C Implementation Pointers

    - by Dwaine Bailey
    Hi, I am currently writing an XML parser that parses a lot of data, with a lot of different nodes (the XML isn't designed by me, and I have no control over the content...) Anyway, it currently takes an unacceptably long time to download and read in (about 13 seconds) and so I'm looking for ways to increase the efficiency of the read. I've written a function to create hash values, so that the program no longer has to do a lot of string comparison (just NSUInteger comparison), but this still isn't reducing the complexity of the read in... So I thought maybe I could create an array of IMPs so that, I could then go something like: for(int i = 0; i < [hashValues count]; i ++) { if(currHash == [[hashValues objectAtIndex:i] unsignedIntValue]) { [impArray objectAtIndex:i]; } } Or something like that. The only problem is that I don't know how to actually make the call to the IMP function? I've read that I perform the selector that an IMP defines by going IMP tImp = [impArray objectAtIndex:i]; tImp(self, @selector(methodName)); But, if I need to know the name of the selector anyway, what's the point? Can anybody help me out with what I want to do? Or even just some more ways to increase the efficiency of the parser...

    Read the article

  • parse search string

    - by Benjamin Ortuzar
    I have search strings, similar to the one bellow: energy food "olympics 2010" Terrorism OR "government" OR cups NOT transport and I need to parse it with PHP5 to detect if the content belongs to any of the following clusters: AllWords array AnyWords array NotWords array These are the rules i have set: If it has OR before or after the word or quoted words if belongs to AnyWord. If it has a NOT before word or quoted words it belongs to NotWords If it has 0 or more more spaces before the word or quoted phrase it belongs to AllWords. So the end result should be something similar to: AllWords: (energy, food, "olympics 2010") AnyWords: (terrorism, "government", cups) NotWords: (Transport) What would be a good way to do this?

    Read the article

  • How to get entire input string in Lex and Yacc?

    - by DevDevDev
    OK, so here is the deal. In my language I have some commands, say XYZ 3 5 GGB 8 9 HDH 8783 33 And in my Lex file XYZ { return XYZ; } GGB { return GGB; } HDH { return HDH; } [0-9]+ { yylval.ival = atoi(yytext); return NUMBER; } \n { return EOL; } In my yacc file start : commands ; commands : command | command EOL commands ; command : xyz | ggb | hdh ; xyz : XYZ NUMBER NUMBER { /* Do something with the numbers */ } ; etc. etc. etc. etc. My question is, how can I get the entire text XYZ 3 5 GGB 8 9 HDH 8783 33 Into commands while still returning the NUMBERs? Also when my Lex returns a STRING [0-9a-zA-Z]+, and I want to do verification on it's length, should I do it like rule: STRING STRING { if (strlen($1) < 5 ) /* Do some shit else error */ } or actually have a token in my Lex that returns different tokens depending on length?

    Read the article

  • Extracting email addresses in an html block in ruby/rails

    - by corroded
    I am creating a parser that wards off against spamming and harvesting of emails from a block of text that comes from tinyMCE (so it may or may not have html tags in it) I've tried regexes and so far this has been successful: /\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b/i problem is, i need to ignore all email addresses with mailto hrefs. for example: <a href="mailto:[email protected]">[email protected]</a> should only return the second email add. To get a background of what im doing, im reversing the email addresses in a block so the above example would look like this: <a href="mailto:[email protected]">moc.liam@tset</a> problem with my current regex is that it also replaces the one in href. Is there a way for me to do this with a single regex? Or do i have to check for one then the other? Is there a way for me to do this just by using gsub or do I have to use some nokogiri/hpricot magicks and whatnot to parse the mailtos? Thanks in advance! Here were my references btw: so.com/questions/504860/extract-email-addresses-from-a-block-of-text so.com/questions/1376149/regexp-for-extracting-a-mailto-address im also testing using this: http://rubular.com/ edit here's my current helper code: def email_obfuscator(text) text.gsub(/\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b/i) { |m| m = "<span class='anti-spam'>#{m.reverse}</span>" } end which results in this: <a target="_self" href="mailto:<span class='anti-spam'>moc.liamg@tset</span>"><span class="anti-spam">moc.liamg@tset</span></a>

    Read the article

  • Problem with eastern european characters when scraping data from the European Parliaments Website

    - by Thomas Jensen
    Dear Experts I am trying to scrape a lot of data from the European Parliament website for a research project. Ther first step is the create a list if all parliamentarians, however due to the many Eastern European names and the accents they use i get a lot of missing entries. Here is an example of what is giving me troubles (notice the accents at the end of the family name): ANDRIKIENE, Laima Liucija Group of the European People's Party (Christian Democrats) So far I have been using PyParser and the following code: parser_names name = Word(alphanums + alphas8bit) begin, end = map(Suppress, "<") names = begin + ZeroOrMore(name) + "," + ZeroOrMore(name) + end for name in names.searchString(page): print(name) However this does not catch the name from the html above. Any advice in how to proceed? Best, Thomas

    Read the article

  • Getting content of the node which has childs via DOMDocument

    - by altern
    I have following html: <html ><body >Body text <div >div content</div></body></html> How could I get content of body without nested <div>? I need to get 'Body text', but do not have a clue how to do this. result of running $domhtml = DOMDocument::loadHTML($html); print $domhtml->getElementsByTagName('body')->item(0)->nodeValue; is 'Body textdiv content', which is not exactly what I want to get

    Read the article

  • why does b'(and sometimes b' ') show up when I split some HTML source[Python]

    - by Oliver
    I'm fairly new to Python and programming in general. I have done a few tutorials and am about 2/3 through a pretty good book. That being said I've been trying to get more comfortable with Python and proggramming by just trying things in the std lib out. that being said I have recently run into a wierd quirk that I'm sure is the result of my own incorrect or un-"pythonic" use of the urllib module(with Python 3.2.2) import urllib.request HTML_source = urllib.request.urlopen(www.somelink.com).read() print(HTML_source) when this bit is run through the active interpreter it returns the HTML source of somelink, however it prefixes it with b' for example b'<HTML>\r\n<HEAD> (etc). . . . if I split the string into a list by whitespace it prefixes every item with the b' I'm not really trying to accomplish something specific just trying to familiarize myself with the std lib. I would like to know why this b' is getting prefixed also bonus -- Is there a better way to get HTML source WITHOUT using a third party module. I know all that jazz about not reinventing the wheel and what not but I'm trying to learn by "building my own tools" Thanks in Advance!

    Read the article

  • boost::Spirit Grammar for unsorted schema

    - by Hassan Syed
    I have a section of a schema for a model that I need to parse. Lets say it looks like the following. { type = "Standard"; hostname="x.y.z"; port="123"; } The properties are: The elements may appear unordered. All elements that are part of the schema must appear, and no other. All of the elements' synthesised attributes go into a struct. (optional) The schema might in the future depend on the type field -- i.e., different fields based on type -- however I am not concerned about this at the moment.

    Read the article

  • Accessing items separated by -componentsSeparatedByString

    - by Graeme
    Hi, I have an array gathered by componentsSeparatedByString: that looks like the following when I use po in the GDB after the array has gone through componentsSeparatedByString: "\n\t\t <b>Suburb, </b> BAIRNSDALE", "\n\t\t <b>Address, </b> 15K NW BAIRNSDALE", "\n\t\t <b>Reference, </b> MELWOOD/SCHOOL ROAD", "\n\t\t <b>Last Changed, </b> 09/04/10 05, 29, 00 PM", "\n\t\t <b>Type, </b> HOME", "\n\t\t <b>Status, </b> BUILT", "\n\t\t <b>Property Size, </b> 2.00 HA.", "\n\t\t <b>Residents, </b> 2", "\n\t\t <b>First Added Date/Time, </b> 09/04/10 03, 15, 00 PM", "\n\t\t\t" Only problem is, I now can't figure out where to go from here. I need to be able to access each of these items (i.e. type, status, property size) separately rather than just calling the entire array (i.e. currentProperty.status). How do I do this? Also what's with all the n\t\t\t things - how do I get rid of them? Thanks.

    Read the article

  • CSSOMParser in gwt client side

    - by Zoja
    What i would like to do is to read an css file from a GET request on the client side, and then i would like to parse it to check all the classes. The problem is that I need to implement CSSOMParser for that, and here are the imports import org.w3c.dom.css.CSSRule; import org.w3c.dom.css.CSSRuleList; import org.w3c.dom.css.CSSStyleRule; import org.w3c.dom.css.CSSStyleSheet; import com.steadystate.css.parser.CSSOMParser; the problem is that none of those classes ale probably javascript compilant, so they don't want to compile if they're on the client side. Is there a way to get it done ?

    Read the article

  • getElementsByClassName not working on parsed html data in greasemonkey

    - by Sid
    Hi my code is as such var xhReq = new XMLHttpRequest(); xhReq.open("GET", linksRaw, false); xhReq.send(null); var serverResponse = xhReq.responseText; var tempDiv = document.createElement('div'); tempDiv.innerHTML = serverResponse.replace(/<script(.|\s)*?\/script>/g, ''); var plzWork = tempDiv.getElementsByClassName('organizationID').innerHTML; console.log(plzWork); The value of 'plzWork' :-) which is logged to the firebug console is always 'undefined' while the link code is <a class="organisationID" href="orglists.htm">Partner Organisations</a> I'm writing this script in the latest versions of Greasemonkey and FF 3.6 Thanks

    Read the article

  • How to extract paragaph and selected lines with Perl

    - by neversaint
    I have a text that looks like this. What I want to do is to extract the whole paragraph under the section "Aceview summary" until the line that starts with "Please quote". extract the line that starts with "The closest human gene". And store them into array with two elements. However I am stuck with the following script logic. What's the right way to achieve that? #!/usr/bin/perl -w my $INFILE_file_name = $file; # input file name open ( INFILE, '<', $INFILE_file_name ) or croak "$0 : failed to open input file $INFILE_file_name : $!\n"; my @allsum; while ( <INFILE> ) { chomp; my $line = $_; my @temp1 = (); if ( $line =~ /^ AceView summary/ ) { print "$line\n"; push @temp1, $line; } elsif( $line =~ /Please quote/) { push @allsum, [@temp1]; @temp1 = (); } } close ( INFILE ); # close input file

    Read the article

  • Extracting a table row with a particular attribute,using HTMLAGILITY pack

    - by Soham
    Consider this piece of code: <tr> <td valign=top class="tim_new"><a href="/stocks/company_info/pricechart.php?sc_did=MI42" class="tim_new">3M India</a></td> <td class="tim_new" valign=top><a href='/stocks/marketstats/indcomp.php?optex=NSE&indcode=Diversified' class=tim>Diversified</a></td> I want to write a piece of code using HTMLAgility pack which would extract the link in the first line.

    Read the article

  • PHP Simple_html_dom issue

    - by stef
    The snippet below loops through some web pages, grabs the html and then looks for table.results and gets the plaintext out of the tags contained in each . $results is ok. Now I'm trying to get the href value of an tag that is found in the second of each . I'd like to include this in the $results array, but I'm not sure how to do this. The third foreach statement gets them but then I need to merge $links with $results. Ideally I'd also get the links in the second foreach statement. Does anyone know how? $i = 0; foreach( $urls as $u ) { $html = file_get_html($u); foreach($html->find('.results tbody tr') as $element) { $result[$i] = $this->extract($element->plaintext); $i++; } foreach($html->find('.results tbody tr a') as $element) { $links[$i] = $element->href; $i++; } } print_r($result); print_r($links); die;

    Read the article

  • libxml2 on iPhone

    - by mellkord
    I'm trying to parse HTML file with libxml2. Usually this works fine, but not in this case: <p> <b>Titles</b> (Some Text) <table> <tr> <td valign="top"> …Something1... </td> <td align="right" valign="top"> …Something2... </td> </tr> </table> </p> I do this query to get the first <td> //p[b='Titles']/table/tr/td[0] but nothing is returned because libxml think that <table> tag is not a child of a tag <p> and following him. And finally the question WHY?

    Read the article

  • How do Relational Databases Work Under the Hood?

    - by Pierreten
    I've always been interested in how you can throw some SQL at at database, and it nearly instantaneously returns your results in an orderly manner without thinking about it as anything other than a black box. What is really going on? I'm pretty sure it has something to do with how values are laid out regularly in memory, similar to an array; but aside from that, I don't know much else. How is SQL parsed in a manner to facilitate all of this?

    Read the article

  • how to read an address in multiple formats like google maps

    - by ratan
    notice that on google maps you can input the address any way you like. as long as it is a valid address...google maps will read it. In some ruby book I had seen code snippet for something like this, but with phone numbers. Any ideas how this could be done for addresses? in language of your choice. EDIT: i dont care about a "valid" address. I just want to parse an address. so that 123 fake street, WA, 34223 would be an address and so will 123 fake street WA 34223

    Read the article

< Previous Page | 40 41 42 43 44 45 46 47 48 49 50 51  | Next Page >