Search Results

Search found 13659 results on 547 pages for 'email parsing'.

Page 132/547 | < Previous Page | 128 129 130 131 132 133 134 135 136 137 138 139  | Next Page >

  • Killing HTML nodes from shell

    - by hendry
    Need a solution to kill nodes like <footer>foobar</footer> and <div class="nav"></div> from many several HTML files. I want to dump a site to disk without the menus and footers and what not. Ideally I would accomplish this task using basic unix tools like sed. Since it's not XML I can't use xmlstarlet. Could anyone please suggest recipes, so I can ideally have a script running kill-node.sh 'div class="toplinks"' *.html to prune the bits I don't want. Thank you,

    Read the article

  • extract specific element from nested elements using lxml html

    - by Dan.StackOverflow
    Hi all I am having some problems that I think can be attributed to xpath problems. I am using the html module from the lxml package to try and get at some data. I am providing the most simplified situation below, but keep in mind the html I am working with is much uglier. <table> <tr> <td> <table> <tr><td></td></tr> <tr><td> <table> <tr><td><u><b>Header1</b></u></td></tr> <tr><td>Data</td></tr> </table> </td></tr> </table> </td></tr> </table> What I really want is the deeply nested table, because it has the header text "Header1". I am trying like so: from lxml import html page = '...' tree = html.fromstring(page) print tree.xpath('//table[//*[contains(text(), "Header1")]]') but that gives me all of the table elements. I just want the one table that contains this text. I understand what is going on but am having a hard time figuring out how to do this besides breaking out some nasty regex. Any thoughts?

    Read the article

  • How to Extract data asocaited with attribute of XML file using python 3.2

    - by user1460383
    I have this xml format..... <event timestamp="0.447463" bustype="LIN" channel="LIN 1"> <col name="Time"/> <col name="Start of Frame">0.440708</col> <col name="Channel">LIN 1</col> <col name="Dir">Tx</col> <col name="Event Type">LIN Frame (Diagnostic Request)</col> <col name="Frame Name">MasterReq_DB</col> <col name="Id">3C</col> <col name="Data">81 06 04 04 FF FF 50 4C</col> <col name="Publisher">TestMaster (simulated)</col> <col name="Checksum">D3 &quot;Classic&quot;</col> <col name="Header Duration">2.090 ms (40.1 bits)</col> <col name="Resp. Duration">4.688 ms (90.0 bits)</col> <col name="Time difference">0.049987</col> <empty/> </event> In above xml, i need to extract data associated with attribute 'name' Am able to get all names but am unable to fetch MasterReq_DB< field Please help me ... Thanks in advance

    Read the article

  • Best way to replace XML Text

    - by ryudice
    Hi, I have a web service which returns the following XML: <Validacion> <Es_Valido>NK7+22XrSgJout+ZeCq5IA==</Es_Valido> </Validacion> <Estatus> <Estatus>dqrQ7VtQmNFXmXmWlZTL7A==</Estatus> </Estatus> <Generales> <Nombre>V4wb2/tq9tEHW80tFkS3knO8i4yTpJzh7Jqi9MxpVVE=</Nombre> <Apellido>jXyRpjDQvsnzZz+wsq6b42amyTg2np0wckLmQjQx1rCJc8d3dDg6toSdSX200eGi</Apellido> <Ident_Clie>IYbofEiD+wOCJ+ujYTUxgsWJTnGfVU+jcQyhzgQralM=</Ident_Clie> <Fec_Creacion>hMI2YyE5h2JVp8CupWfjLy24W7LstxgmlGoDYjPev0r8TUf2Tav9MBmA2Xd9Pe8c</Fec_Creacion> <Nom_Asoc>CF/KXngDNY+nT99n1ITBJJDb08/wdou3e9znoVaCU3dlTQi/6EmceDHUbvAAvxsKH9MUeLtbCIzqpJq74e QfpA==</Nom_Asoc> <Fec_Defuncion /> </Generales> The text inside the tags in encrypted, I need to decrypt the text, I've come up with a regular expressions solution but I don't think it's very optimal, is there a better way to do this? thanks!

    Read the article

  • I need to Split a string based on a complex delimiter

    - by Jason
    In C# I need to split a string (a log4j log file) into array elements based on a particular sequence of characters, namely "nnnn-nn-nn nn:nn:nn INFO". I'm currently splitting this log file up by newlines, which is fine except when the log statements themselves contain newlines. I don't control the input (the log file) so escaping them somehow is not an option. It seems like I should be able to use a comparator or a regex to identify the strings, but String.Split does not have an option like that. Am I stuck rolling my own, or is there a pattern or framework component that can be of help here?

    Read the article

  • Generic type parameters using out

    - by Mikael
    Im trying to make a universal parser using generic type parameters, but i can't grasp the concept 100% private bool TryParse<T>(XElement element, string attributeName, out T value) where T : struct { if (element.Attribute(attributeName) != null && !string.IsNullOrEmpty(element.Attribute(attributeName).Value)) { string valueString = element.Attribute(attributeName).Value; if (typeof(T) == typeof(int)) { int valueInt; if (int.TryParse(valueString, out valueInt)) { value = valueInt; return true; } } else if (typeof(T) == typeof(bool)) { bool valueBool; if (bool.TryParse(valueString, out valueBool)) { value = valueBool; return true; } } else { value = valueString; return true; } } return false; } As you might guess, the code doesn't compile, since i can't convert int|bool|string to T (eg. value = valueInt). Thankful for feedback, it might not even be possible to way i'm doing it. Using .NET 3.5

    Read the article

  • Comprehensive and well maintained wiki syntax Parser for PHP

    - by Rowan
    I'm looking for a comprehensive and well maintained wiki syntax Parser for PHP, does anybody know of one? I can find some really good parsers for markdown and bbcode but am having trouble with finding a decent wiki parser. I prefer markdown myself, but I'm writing post functions for a CMS and I'd like to give end-users a choice. I thought about downloading a copy of MediaWiki and seeing how they do it, thoughts on this as an option?

    Read the article

  • Load XML to DataFrame in R

    - by Rohit Kandhal
    I am new to R programming and trying to load a simple XML in RStudio. I tried using XMLToDataFrame but got this error XML content does not seem to be XML: 'temp.xml' XML Schema <root> <row Id="1" UserId="1" Name="Rohit" Date="2009-06-29T10:28:58.013" /> <row Id="2" UserId="3" Name="Rohit" Date="2009-06-29T10:28:58.030" /> </root> Please provide me some direction on which function I should use here.

    Read the article

  • Using regex to extract variables from a plain-text form letter?

    - by Yaaqov
    Hi - I'm looking for a good example of using Regular Expressions in PHP to "reverse engineer" a form letter (with a known format, of course) that has been pasted into a multiline textbox and sent to a script for processing. So, for example, let's assume this is the original plain-text input (taken from a USDA press release): WASHINGTON, April 5, 2010 - North American Bison Co-Op, a New Rockford, N.D., establishment is recalling approximately 25,000 pounds of whole beef heads containing tongues that may not have had the tonsils completely removed, which is not compliant with regulations that require the removal of tonsils from cattle of all ages, the U.S. Department of Agriculture's Food Safety and Inspection Service (FSIS) announced today. For clarity, the fields that are variables are highlighted below: [pr_city=]WASHINGTON, [pr_date=]April 5, 2010 - [corp_name=]North American Bison Co-Op, a [corp_city=]New Rockford, [corp_state=]N.D., establishment is recalling approximately [amount=]25,000 pounds of [product=]whole beef heads containing tongues that may not have had the tonsils completely removed, which is not compliant with regulations that require [reason=]the removal of tonsils from cattle of all ages, the U.S. Department of Agriculture's Food Safety and Inspection Service (FSIS) announced today. How could I efficiently extract the contents of the pr_city pr_date corp_name corp_city corp_state amount product reason fields from my example? Any help would be appreciated, thanks.

    Read the article

  • Incorrectly formatted html inconsistencies between DOM and what's displayed in firefox plugin

    - by deadalnix
    I'm currently developing a firefox plugin. This plugin has to handle very crappy website that is really incorrectly formatted. I cannot modify these websites, so I have to handle them. I reduced the bug I'm facing to a short sample of html (if this appellation is appropriate for an horror like this) : <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html> <head> <title>Some title.</title> <!-- Oh fuck yes ! --> <div style="visability:hidden;"> <a href="//example.com"> </a> </div> <!-- If meta are reduced, then the bug disapears ! --> <meta name="description" content="Homepage of Company.com, Company's corporate Web site" /> <meta name="keywords" content="Company, Company & Co., Inc., blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla" /> <meta http-equiv="Content-Language" content="en-US" /> <meta http-equiv="content-type" content="text/html; charset=utf-8"/> </head> <body class="homePage"> <div class="globalWrapper"><a href="/page.html">My gorgeous link !</a></div> </body> </html> When opening the webpage, « My gorgeous link ! » if displayed and clickable. However, when I'm exploring the DOM with Javascript into my plugin, everything behaves (DOM exploration and innerHTML property) like the code was this one : <html> <head> <title>Some title.</title> <!-- Oh fuck yes ! --> </head><body><div style="visability:hidden;"> <a href="//example.com"> </a> </div> <!-- If meta are reduced, then the bug disapears ! --> <meta name="description" content="Homepage of Company.com, Company's corporate Web site"> <meta name="keywords" content="Company, Company &amp; Co., Inc., blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla"> <meta http-equiv="Content-Language" content="en-US"> </body> </html> So, when exploring the DOM within the plugin, the document is somehow fixed by firefox. But this fixed DOM is inconsistent with what is in the webpage. Thus, my plugin doesn't behave as expected. I'm really puzzled with that issue. The problem exists in both firefox 3.6 and firefox 4 (didn't tested firefox 5 yet). For example, reducing the meta, will fix the issue. Where does this discrepancy come from ? How can I handle it ? EDIT: With the answer I get, I think I should be a little more precise. I do know what firefow is doing when modifying the webpage in the second code snippet. The problem is the following one : « In the fixed DOM that I get into my plugin, the gorgeous link doesn't appear anywhere, but this link is actually visible on the webpage, and works. So the DOM I'm manipulating, and the DOM in the webpage are different - they are fixed in a different manner. » . So where does the difference come in the fixing behaviour, and how can I handle that, or, in other terms, how can I be aware, in my plugin, of the existance of the gorgeous link ?

    Read the article

  • Why is Swing Parser's handleText not handling nested tags?

    - by Jim P
    I need to transform some HTML text that has nested tags to decorate 'matches' with a css attribute to highlight it (like firefox search). I can't just do a simple replace (think if user searched for "img" for example), so I'm trying to just do the replace within the body text (not on tag attributes). I have a pretty straightforward HTML parser that I think should do this: final Pattern pat = Pattern.compile(srch, Pattern.CASE_INSENSITIVE); Matcher m = pat.matcher(output); if (m.find()) { final StringBuffer ret = new StringBuffer(output.length()+100); lastPos=0; try { new ParserDelegator().parse(new StringReader(output.toString()), new HTMLEditorKit.ParserCallback () { public void handleText(char[] data, int pos) { ret.append(output.subSequence(lastPos, pos)); Matcher m = pat.matcher(new String(data)); ret.append(m.replaceAll("<span class=\"search\">$0</span>")); lastPos=pos+data.length; } }, false); ret.append(output.subSequence(lastPos, output.length())); return ret; } catch (Exception e) { return output; } } return output; My problem is, when I debug this, the handleText is getting called with text that includes tags! It's like it's only going one level deep. Anyone know why? Is there some simple thing I need to do to HTMLParser (haven't used it much) to enable 'proper' behavior of nested tags? PS - I figured it out myself - see answer below. Short answer is, it works fine if you pass it HTML, not pre-escaped HTML. Doh! Hope this helps someone else. <span>example with <a href="#">nested</a> <p>more nesting</p> </span> <!-- all this gets thrown together -->

    Read the article

  • Parse string with bash and extract number

    - by cleg
    Hello I've got supervisor's status output, looking like this. frontend RUNNING pid 16652, uptime 2:11:17 nginx RUNNING pid 16651, uptime 2:11:17 redis RUNNING pid 16607, uptime 2:11:32 I need to extract nginx's PID. I've done it via grep -P command, but on remote machine grep is build without perl regular expression support. Looks like sed or awk is exactly what I need, but I don't familiar with them. Please help me to find a way how to do it, thanks in advance.

    Read the article

  • Regex to extract portions of file name

    - by jakesankey
    I have text files formatted as such: R156484COMP_004A7001_20100104_065119.txt I need to consistently extract the R****COMP, the 004A7001 number, 20100104 (date), and don't care about the 065119 number. the problem is that not ALL of the files being parsed have the exact naming convention. some may be like this: R168166CRIT_156B2075_SU2_20091223_123456.txt or R285476COMP_SU1_125A6025_20100407_123456.txt So how could I use regex instead of split to ensure I am always getting that serial (ex. 004A7001), the date (ex. 20100104), and the R****COMP (or CRIT)??? Here is what I do now but it only gets the files formatted like my first example. if (file.Count(c => c == '_') != 3) continue; and further down in the code I have: string RNumber = Path.GetFileNameWithoutExtension(file); string RNumberE = RNumber.Split('_')[0]; string RNumberD = RNumber.Split('_')[1]; string RNumberDate = RNumber.Split('_')[2]; DateTime dateTime = DateTime.ParseExact(RNumberDate, "yyyyMMdd", Thread.CurrentThread.CurrentCulture); string cmmDate = dateTime.ToString("dd-MMM-yyyy"); UPDATE: This is now where I am at -- I get an error to parse RNumberDate to an actual date format. "Cannot implicitly convert type 'RegularExpressions.Match' to 'string' string RNumber = Path.GetFileNameWithoutExtension(file); Match RNumberE = Regex.Match(RNumber, @"^(R|L)\d{6}(COMP|CRIT|TEST|SU[1-9])(?=_)", RegexOptions.IgnoreCase); Match RNumberD = Regex.Match(RNumber, @"(?<=_)\d{3}[A-Z]\d{4}(?=_)", RegexOptions.IgnoreCase); Match RNumberDate = Regex.Match(RNumber, @"(?<=_)\d{8}(?=_)", RegexOptions.IgnoreCase); DateTime dateTime = DateTime.ParseExact(RNumberDate, "yyyyMMdd", Thread.CurrentThread.CurrentCulture); string cmmDate = dateTime.ToString("dd-MMM-yyyy")

    Read the article

  • Splitting a string according to a delimiter when elements in the string can contain the delimiter

    - by Vivin Paliath
    I have a string that looks like this: "#Text() #SomeMoreText() #TextThatContainsDelimiter(#blah) #SomethingElse()" I'd like to get back [#Text(), #SomeMoreText(), #TextThatContainsDelimiter(#blah), #SomethingElse()] One way I thought about doing this was to require that the # to be escaped into \#, which makes the input string: "#Text() #SomeMoreText() #TextThatContainsDelimiter(\#blah) #SomethingElse()" I can then split it using /[^\\]#/ which gives me: [#Text(), SomeMoreText, TextThatContainsDelimiter(\#blah), SomethingElse()] The first element will contain # but I can strip it out. However, is there a cleaner way to do this without having to escape the #, and which ensures that the first element will not contain a #? Basically I'd like it to split by # only if the # is not enclosed by parentheses. My hunch is that since the # is context-sensitive and and regular expressions are only suited for context-free strings, this may not be the right tool. If so, would I have to write a grammar for this and roll my own parser/lexer?

    Read the article

  • Best way to parse float?

    - by boris callens
    What is the best way to parse a float in CSharp? I know about TryParse, but what I'm particularly wondering about is dots, commas etc. I'm having problems with my website. On my dev server, the ',' is for decimals, the '.' for separator. On the prod server though, it is the other way round. How can I best capture this?

    Read the article

  • UNIX Parse HTML Page Display Contents of a Tag - One Liner?

    - by NJTechie
    I have an HTML file and I am interested in the data enclosed by <pre> </pre> tags. Is there a one-liner that can do achieve this? Sample file : <html> <title> Hello There! </title> <body> <pre> John Working Kathy Working Mary Working Kim N/A </pre> </body> </html> Output should be : John Kathy Mary Kim Much appreciated guys, thank you!

    Read the article

  • What is the best way to parse python script file in C/C++ code

    - by alexpov
    I am embedding python in C/C++ program. What I am trying to do is to parse the python script file from the C/C++ program, break the file to "blocks" so that each "block" is an a valid command in python code. Each block I need to put into std::string. For example: #PythonScript.py import math print "Hello Python" i = 0; while (i < 10): print "i = " , i; i = i + 1; print "GoodBye Python" In this script are 5 different "blocks": the first one is "import math;" the second is "print "Hello Python;" the third is "i = 0;" and the fourth is while (i < 10):\n\tprint "i = " , i;\n\ti = i + 1; My knowledge in python is very basic and I am not familiar with the python code syntax. What is the best way to do this, is there any Python C/C++ API function that supports this? why i need it - for GUI purpose. My program , which is writen in C, uses python to make some calculations. I run from C code , using python C API , python script and what i need is a way to capture python's output in my program. I catch it and evrything is ok, the problem is when the script involves user input. What happens is that i capture python's output after the script is finished , therefore, when there is an input command in the script i get a black screen .... i need to get all the printings before the input command. The first solution i tried is to parss the script to valid commands and run each comand, one after the other , seperatly .... for this i need to pars the script and deside what is a command and what is not ... The question is : what is the best way to do this and if there is somthing that allready does ?

    Read the article

  • BioPython: extracting sequence IDs from a Blast output file

    - by Jon
    Hi, I have a BLAST output file in XML format. It is 22 query sequences with 50 hits reported from each sequence. And I want to extract all the 50x22 hits. This is the code I currently have, but it only extracts the 50 hits from the first query. from Bio.Blast import NCBIXM blast_records = NCBIXML.parse(result_handle) blast_record = blast_records.next() save_file = open("/Users/jonbra/Desktop/my_fasta_seq.fasta", 'w') for alignment in blast_record.alignments: for hsp in alignment.hsps: save_file.write('>%s\n' % (alignment.title,)) save_file.close() Somebody have any suggestions as to extract all the hits? I guess I have to use something else than alignments. Hope this was clear. Thanks! Jon

    Read the article

  • What is the best way to process XML sent to WCF 3.5

    - by CRM Junkie
    I have to develop a WCF application in 3.5. The input will be sent in the form of XML and the response would be sent in the form of XML as well. A ASP.NET application will be consuming the WCF and sending/receiving data in XML format. Now, as per my understanding, when consuming WCF from an ASP.NET application, we just add a reference to the service, create an object of the service, pack all the necessary data(Data Members in WCF) into the input object (object of the Data Contract) and call the necessary function. It happens that the ASP.NET application is being developed by a separate party and they are hell bent on receiving and sending data in XML format. What I can perceive from this is that the WCF will take the XML string (a single Data Member string type) as input and send out a XML string (again a single Data Member string type) as output. I have created WCF applications earlier where requests and responses were sent out in XML/JSON format when it was consumed by jQuery ajax calls. In those cases, the XML tags were automatically mapped to the different Data Members defined. What approach should I take in this case? Should I just take a string as input (basically the XML string) or is there any way WCF/.NET 3.5 will automatically map the XML tags with the Data Members for requests and responses and I would not need to parse the XML string separately?

    Read the article

  • Comparing datafeeds from different networks (Affiliate Marketing)

    - by Logistetica
    Hi, I am working on integrating affiliate sales into few existing sites. We are using a few merchants who work via different networks (cj, shareasale, linkshare, avantlink). Now my observation is that all these networks provide data feeds in different formats. But that's not a big problem. My main concern is actually merchants using different titles on same products. I don't want to run into these situations: a) two listings of the SAME product from N merchants (if titles are just a bit different) b) one listing of N different products from merchants (if we don't use strict comparison algorithm) We want to automate everything as much as possible, want to avoid operators scanning listings under question all the time. How is this problem typically handled?

    Read the article

< Previous Page | 128 129 130 131 132 133 134 135 136 137 138 139  | Next Page >