Search Results

Search found 54956 results on 2199 pages for 'parsing error'.

Page 155/2199 | < Previous Page | 151 152 153 154 155 156 157 158 159 160 161 162  | Next Page >

  • Any Java library for address extraction from emails?

    - by Hans Klock
    I'm looking for an Java open-source library which is able to extract address information from a (German) email (signature). The library should find name street city, city code/postal code email tel/fax address-parser.com is an commercial product, but a free (albeit simple) library would be great. stackoverflow.com/questions/16413/parse-usable-street-address-city-state-zip-from-a-string is asking for something similar, but my problem is broader because the address information is hidden in a complete email. And there isn't a solution either... Any ideas?

    Read the article

  • Xerces SAX parser ignore the xmlxs:xsi attribute as an attribute of an element

    - by user603301
    Hi, Using Xerces SAX parser I try to retrieve all elements and their attributes of this XML file: -------------- Begin XML file to parse ---------------- <?xml version="1.0" encoding="UTF-8"?> <invoice xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="my.xsd"> <parties> (...) -------------- End XML file to parse ---------------- When getting the attributes for the element 'invoice', Xerces++ does not insert the 'xmlns:xsi' attribute in the list of 'Attributes' for the element 'invoice'. However, the attribute 'xsi:noNamespaceSchemaLocation' is inserted in the list. Why? Is there a specific reason from an XML standard point of view ? Is there a way to configure Xerces++ SAX parser so that it inserts this attribute as well? (The documentation on setting the parser properties does not tell how). Thanks for your help.

    Read the article

  • making arrays from tab-delimited text file column

    - by absolutenewbie
    I was wondering if anyone could help a desperate newbie with perl with the following question. I've been trying all day but with my perl book at work, I can't seem to anything relevant in google...or maybe am genuinely stupid with this. I have a file that looks something like the following: Bob April Bob April Bob March Mary August Robin December Robin April The output file I'm after is: Bob April April March Mary August Robin December April So that it lists each month in the order that it appears for each person. I tried making it into a hash but of course it wouldn't let me have duplicates so I thought I would like to have arrays for each name (in this example, Bob, Mary and Robin). I'm afraid to upload the code I've been trying to tweak because I know it'll be horribly wrong. I think I need to define(?) the array. Would this be correct? Any help would be greatly appreciated and I promise I will be studying more about perl in the meantime. Thank you for your time, patience and help. #!/usr/bin/perl -w while (<>) { chomp; if (defined $old_name) { $name=$1; $month=$2; if ($name eq $old_name) { $array{$month}++; } else { print "$old_name"; foreach (@array) { push (@array, $month); print "\t@array"; } print "\n"; @array=(); $array{$month}++; } } else { $name=$1; $month=$2; $array{month}++; } $old_name=$name; } print "$old_name"; foreach (@array) { push (@array, $month); print "\t@array"; } print "\n";

    Read the article

  • Objective-C Implementation Pointers

    - by Dwaine Bailey
    Hi, I am currently writing an XML parser that parses a lot of data, with a lot of different nodes (the XML isn't designed by me, and I have no control over the content...) Anyway, it currently takes an unacceptably long time to download and read in (about 13 seconds) and so I'm looking for ways to increase the efficiency of the read. I've written a function to create hash values, so that the program no longer has to do a lot of string comparison (just NSUInteger comparison), but this still isn't reducing the complexity of the read in... So I thought maybe I could create an array of IMPs so that, I could then go something like: for(int i = 0; i < [hashValues count]; i ++) { if(currHash == [[hashValues objectAtIndex:i] unsignedIntValue]) { [impArray objectAtIndex:i]; } } Or something like that. The only problem is that I don't know how to actually make the call to the IMP function? I've read that I perform the selector that an IMP defines by going IMP tImp = [impArray objectAtIndex:i]; tImp(self, @selector(methodName)); But, if I need to know the name of the selector anyway, what's the point? Can anybody help me out with what I want to do? Or even just some more ways to increase the efficiency of the parser...

    Read the article

  • svnlook always returns an error and no output

    - by Pierre-Alain Vigeant
    I'm running this small C# test program launched from a pre-commit batch file private static int Test(string[] args) { var processStartInfo = new ProcessStartInfo { FileName = "svnlook.exe", UseShellExecute = false, ErrorDialog = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, Arguments = "help" }; using (var svnlook = Process.Start(processStartInfo)) { string output = svnlook.StandardOutput.ReadToEnd(); svnlook.WaitForExit(); Console.Error.WriteLine("svnlook exited with error 0x{0}.", svnlook.ExitCode.ToString("X")); Console.Error.WriteLine("Current output is: {0}", string.IsNullOrEmpty(output) ? "empty" : output); return 1; } } I am deliberately calling svnlook help and forcing an error so I can see what is going on when committing. When this program run, SVN displays svnlook exited with error 0xC0000135. Current output is: empty I looked up the error 0xC0000135 and it mean App failed to initialize properly although it wasn't specific to svnhook. Why is svnlook help not returning anything? Does it fail when executed through another process?

    Read the article

  • parse search string

    - by Benjamin Ortuzar
    I have search strings, similar to the one bellow: energy food "olympics 2010" Terrorism OR "government" OR cups NOT transport and I need to parse it with PHP5 to detect if the content belongs to any of the following clusters: AllWords array AnyWords array NotWords array These are the rules i have set: If it has OR before or after the word or quoted words if belongs to AnyWord. If it has a NOT before word or quoted words it belongs to NotWords If it has 0 or more more spaces before the word or quoted phrase it belongs to AllWords. So the end result should be something similar to: AllWords: (energy, food, "olympics 2010") AnyWords: (terrorism, "government", cups) NotWords: (Transport) What would be a good way to do this?

    Read the article

  • Problem with eastern european characters when scraping data from the European Parliaments Website

    - by Thomas Jensen
    Dear Experts I am trying to scrape a lot of data from the European Parliament website for a research project. Ther first step is the create a list if all parliamentarians, however due to the many Eastern European names and the accents they use i get a lot of missing entries. Here is an example of what is giving me troubles (notice the accents at the end of the family name): ANDRIKIENE, Laima Liucija Group of the European People's Party (Christian Democrats) So far I have been using PyParser and the following code: parser_names name = Word(alphanums + alphas8bit) begin, end = map(Suppress, "<") names = begin + ZeroOrMore(name) + "," + ZeroOrMore(name) + end for name in names.searchString(page): print(name) However this does not catch the name from the html above. Any advice in how to proceed? Best, Thomas

    Read the article

  • Extracting email addresses in an html block in ruby/rails

    - by corroded
    I am creating a parser that wards off against spamming and harvesting of emails from a block of text that comes from tinyMCE (so it may or may not have html tags in it) I've tried regexes and so far this has been successful: /\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b/i problem is, i need to ignore all email addresses with mailto hrefs. for example: <a href="mailto:[email protected]">[email protected]</a> should only return the second email add. To get a background of what im doing, im reversing the email addresses in a block so the above example would look like this: <a href="mailto:[email protected]">moc.liam@tset</a> problem with my current regex is that it also replaces the one in href. Is there a way for me to do this with a single regex? Or do i have to check for one then the other? Is there a way for me to do this just by using gsub or do I have to use some nokogiri/hpricot magicks and whatnot to parse the mailtos? Thanks in advance! Here were my references btw: so.com/questions/504860/extract-email-addresses-from-a-block-of-text so.com/questions/1376149/regexp-for-extracting-a-mailto-address im also testing using this: http://rubular.com/ edit here's my current helper code: def email_obfuscator(text) text.gsub(/\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b/i) { |m| m = "<span class='anti-spam'>#{m.reverse}</span>" } end which results in this: <a target="_self" href="mailto:<span class='anti-spam'>moc.liamg@tset</span>"><span class="anti-spam">moc.liamg@tset</span></a>

    Read the article

  • Getting content of the node which has childs via DOMDocument

    - by altern
    I have following html: <html ><body >Body text <div >div content</div></body></html> How could I get content of body without nested <div>? I need to get 'Body text', but do not have a clue how to do this. result of running $domhtml = DOMDocument::loadHTML($html); print $domhtml->getElementsByTagName('body')->item(0)->nodeValue; is 'Body textdiv content', which is not exactly what I want to get

    Read the article

  • Accessing items separated by -componentsSeparatedByString

    - by Graeme
    Hi, I have an array gathered by componentsSeparatedByString: that looks like the following when I use po in the GDB after the array has gone through componentsSeparatedByString: "\n\t\t <b>Suburb, </b> BAIRNSDALE", "\n\t\t <b>Address, </b> 15K NW BAIRNSDALE", "\n\t\t <b>Reference, </b> MELWOOD/SCHOOL ROAD", "\n\t\t <b>Last Changed, </b> 09/04/10 05, 29, 00 PM", "\n\t\t <b>Type, </b> HOME", "\n\t\t <b>Status, </b> BUILT", "\n\t\t <b>Property Size, </b> 2.00 HA.", "\n\t\t <b>Residents, </b> 2", "\n\t\t <b>First Added Date/Time, </b> 09/04/10 03, 15, 00 PM", "\n\t\t\t" Only problem is, I now can't figure out where to go from here. I need to be able to access each of these items (i.e. type, status, property size) separately rather than just calling the entire array (i.e. currentProperty.status). How do I do this? Also what's with all the n\t\t\t things - how do I get rid of them? Thanks.

    Read the article

  • boost::Spirit Grammar for unsorted schema

    - by Hassan Syed
    I have a section of a schema for a model that I need to parse. Lets say it looks like the following. { type = "Standard"; hostname="x.y.z"; port="123"; } The properties are: The elements may appear unordered. All elements that are part of the schema must appear, and no other. All of the elements' synthesised attributes go into a struct. (optional) The schema might in the future depend on the type field -- i.e., different fields based on type -- however I am not concerned about this at the moment.

    Read the article

  • Out of memory error while using clusterdata in MATLAB

    - by Hossein
    Hi, I am trying to cluster a Matrix (size: 20057x2).: T = clusterdata(X,cutoff); but I get this error: ??? Error using == pdistmex Out of memory. Type HELP MEMORY for your options. Error in == pdist at 211 Y = pdistmex(X',dist,additionalArg); Error in == linkage at 139 Z = linkagemex(Y,method,pdistArg); Error in == clusterdata at 88 Z = linkage(X,linkageargs{1},pdistargs); Error in == kmeansTest at 2 T = clusterdata(X,1); can someone help me. I have 4GB of ram, but think that the problem is from somewhere else..

    Read the article

  • why does b'(and sometimes b' ') show up when I split some HTML source[Python]

    - by Oliver
    I'm fairly new to Python and programming in general. I have done a few tutorials and am about 2/3 through a pretty good book. That being said I've been trying to get more comfortable with Python and proggramming by just trying things in the std lib out. that being said I have recently run into a wierd quirk that I'm sure is the result of my own incorrect or un-"pythonic" use of the urllib module(with Python 3.2.2) import urllib.request HTML_source = urllib.request.urlopen(www.somelink.com).read() print(HTML_source) when this bit is run through the active interpreter it returns the HTML source of somelink, however it prefixes it with b' for example b'<HTML>\r\n<HEAD> (etc). . . . if I split the string into a list by whitespace it prefixes every item with the b' I'm not really trying to accomplish something specific just trying to familiarize myself with the std lib. I would like to know why this b' is getting prefixed also bonus -- Is there a better way to get HTML source WITHOUT using a third party module. I know all that jazz about not reinventing the wheel and what not but I'm trying to learn by "building my own tools" Thanks in Advance!

    Read the article

  • CSSOMParser in gwt client side

    - by Zoja
    What i would like to do is to read an css file from a GET request on the client side, and then i would like to parse it to check all the classes. The problem is that I need to implement CSSOMParser for that, and here are the imports import org.w3c.dom.css.CSSRule; import org.w3c.dom.css.CSSRuleList; import org.w3c.dom.css.CSSStyleRule; import org.w3c.dom.css.CSSStyleSheet; import com.steadystate.css.parser.CSSOMParser; the problem is that none of those classes ale probably javascript compilant, so they don't want to compile if they're on the client side. Is there a way to get it done ?

    Read the article

  • getElementsByClassName not working on parsed html data in greasemonkey

    - by Sid
    Hi my code is as such var xhReq = new XMLHttpRequest(); xhReq.open("GET", linksRaw, false); xhReq.send(null); var serverResponse = xhReq.responseText; var tempDiv = document.createElement('div'); tempDiv.innerHTML = serverResponse.replace(/<script(.|\s)*?\/script>/g, ''); var plzWork = tempDiv.getElementsByClassName('organizationID').innerHTML; console.log(plzWork); The value of 'plzWork' :-) which is logged to the firebug console is always 'undefined' while the link code is <a class="organisationID" href="orglists.htm">Partner Organisations</a> I'm writing this script in the latest versions of Greasemonkey and FF 3.6 Thanks

    Read the article

  • How to extract paragaph and selected lines with Perl

    - by neversaint
    I have a text that looks like this. What I want to do is to extract the whole paragraph under the section "Aceview summary" until the line that starts with "Please quote". extract the line that starts with "The closest human gene". And store them into array with two elements. However I am stuck with the following script logic. What's the right way to achieve that? #!/usr/bin/perl -w my $INFILE_file_name = $file; # input file name open ( INFILE, '<', $INFILE_file_name ) or croak "$0 : failed to open input file $INFILE_file_name : $!\n"; my @allsum; while ( <INFILE> ) { chomp; my $line = $_; my @temp1 = (); if ( $line =~ /^ AceView summary/ ) { print "$line\n"; push @temp1, $line; } elsif( $line =~ /Please quote/) { push @allsum, [@temp1]; @temp1 = (); } } close ( INFILE ); # close input file

    Read the article

  • PHP Simple_html_dom issue

    - by stef
    The snippet below loops through some web pages, grabs the html and then looks for table.results and gets the plaintext out of the tags contained in each . $results is ok. Now I'm trying to get the href value of an tag that is found in the second of each . I'd like to include this in the $results array, but I'm not sure how to do this. The third foreach statement gets them but then I need to merge $links with $results. Ideally I'd also get the links in the second foreach statement. Does anyone know how? $i = 0; foreach( $urls as $u ) { $html = file_get_html($u); foreach($html->find('.results tbody tr') as $element) { $result[$i] = $this->extract($element->plaintext); $i++; } foreach($html->find('.results tbody tr a') as $element) { $links[$i] = $element->href; $i++; } } print_r($result); print_r($links); die;

    Read the article

  • Extracting a table row with a particular attribute,using HTMLAGILITY pack

    - by Soham
    Consider this piece of code: <tr> <td valign=top class="tim_new"><a href="/stocks/company_info/pricechart.php?sc_did=MI42" class="tim_new">3M India</a></td> <td class="tim_new" valign=top><a href='/stocks/marketstats/indcomp.php?optex=NSE&indcode=Diversified' class=tim>Diversified</a></td> I want to write a piece of code using HTMLAgility pack which would extract the link in the first line.

    Read the article

  • libxml2 on iPhone

    - by mellkord
    I'm trying to parse HTML file with libxml2. Usually this works fine, but not in this case: <p> <b>Titles</b> (Some Text) <table> <tr> <td valign="top"> …Something1... </td> <td align="right" valign="top"> …Something2... </td> </tr> </table> </p> I do this query to get the first <td> //p[b='Titles']/table/tr/td[0] but nothing is returned because libxml think that <table> tag is not a child of a tag <p> and following him. And finally the question WHY?

    Read the article

< Previous Page | 151 152 153 154 155 156 157 158 159 160 161 162  | Next Page >