I'm currently using Magpie RSS but it sometimes falls over when the RSS or Atom feed isn't well formed. Are there any other options for parsing RSS and Atom feeds with PHP?
How can we write html tidy coding only for inserting closing tag in the html file where closing tags are missing ?
I am parsing html tabular information using Html Agilitiy Pack. But where the ending tags are missing extracting information with html agility pack are not performed well. And if we write the ending tags manually and then we can extract the information perfectly with html agility pack.So I want to insert the closing tags where they are missing so html agility pack extracts the information perfectly.
How can one insert a Unicode string CSS into CleverCSS?
In particular, how could one produce the following CSS using CleverCSS:
li:after {
content: "\00BB \0020";
}
I've figured out CleverCSS's parsing rules, but suffice that the permutations I've thought sensible have failed, for example:
li:
content: "\\00BB \\0020" // becomes content: 'BB 0'
EDIT: My other examples and the rest of my post weren't saved. Suffice that I had a longer list of examples that also failed, as did my closing which was something like:
I'd be grateful for any thoughts and input.
Brian
In the streams I am parsing I need to parse something in this pattern:
<b>PaintTitle</b></td><td class=detail valign="top" align=left><div align=left><font size=small><b>The new great album by Pet Shop Boys</b>
How would I get the string "The new great album by Pet Shop Boys" where <b>PaintTitle</b> is guaranteed to be once per album?
I've been up and down the Wikipedia API, but I can't figure out if there's a nice way to fetch the excerpt of an article (usually the first paragraph). It would be nice to get the HTML formatting of that paragraph, too.
The only way I currently see of getting something that resembles a snippet is by performing a fulltext search (example), but that's not really what I want (too short).
Is there any other way to fetch the first paragraph of a Wikipedia article than barbarically parsing HTML/WikiText?
Parsing binary sums / products are easy, but I'm having troubles defining a grammar that parses
a + b * c + d + e
as
sum(a, prod(b, c), d, e)
My initial (naive) attempt generated 61 shift / reduce conflicts.
I'm using java cup (but I suppose a solution for any other parser generator would be easily translated).
A quick, fun question - What is the difference between a function declaration in C/C++ and an else-if statement block from a purely parsing standpoint?
void function_name(arguments) {
[statement-block]
}
else if(arguments) {
[statement-block]
}
Looking for the best solution! =)
Hi
I face issue parsing xhtml with DOCTYPE declaration using DOM parser.
Error: java.io.IOException: Server returned HTTP response code: 503 for URL:
http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd%20
Is there a way to parse the xhtml to a Document object iognoring the DOCTYPE.
Hey guy, maybe someone can help:
I have the following .gpx data from wikipedia:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<gpx xmlns="http://www.topografix.com/GPX/1/1" creator="byHand" version="1.1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd">
<wpt lat="39.921055008" lon="3.054223107">
<ele>12.863281</ele>
<time>2005-05-16T11:49:06Z</time>
<name>Cala Sant Vicenç - Mallorca</name>
<sym>City</sym>
</wpt>
</gpx>
When I call my parsing method, I get a exception (see below) The call looks like this:
Document tmpDoc = getParsedXML(currentGPX);
My method to parse looks like this (standart parsing code, nothing exctiting....):
public static Document getParsedXML(String fileWithPath){
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db;
Document doc = null;
try {
db = dbf.newDocumentBuilder();
doc = db.parse(new File(fileWithPath));
} catch (ParserConfigurationException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return doc;
}
This simple code throws following exception:
com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 3-byte UTF-8 sequence.
at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
at Zeugs.getParsedXML(Zeugs.java:38)
at Zeugs.main(Zeugs.java:25)
I guess the error lies within the format of the first file, but I don't know where exactly.
Can you please give me a hint?
I didn't find anything about parsing html in the XML::LibXML::Reader-documentation. And I tried to parse a html-site and it didn't work.
Is my conclusion, that XML::LibXML::Reader doesn't work with html right?
In choosing an editor for my wiki-like site, I'm debating whether to allow HTML or a custom alternate markup (maybe like wikipedia/wikimedia's or BBCode).
HTML benefits:
Easy for users to deal with (copying and pasting, learning)
Somewhat future proof
Many more editing tools available, usually WYSIWYG too
Alternate markup benefits:
On the server side I don't have to worry about parsing malicious javascript or styles or HTML that I don't allow
Can be easy to learn
Can be easier to decipher if not HTML-savvy
Am I missing something, what's the best solution?
I am new to iphone development.I am able to parse a Xml file at a URL and retrieve it contents from a particular nodes.
For Parsing at url
NSString * path = @"xxxxxxxxxxxxxxxxxxxxxx";
[self parseXMLFileAtURL:path];
For retrieving the data i use NSXMLParser .How can i achieve the same thing if i have HTML file at my URL(Source code of the webpage is HTML).Please help me out.Thanks.
Generally I use lxml for my HTML parsing needs, but that isn't available on Google App Engine. The obvious alternative is BeautifulSoup, but I find it chokes too easily on malformed HTML. Currently I am testing libxml2dom and have been getting better results.
Which pure Python HTML parser have you found performs best? My priority is the ability to handle bad HTML over speed.
The user can enter a math problem like 5 + 654, 6 ^ 24, 2!, sqrt(543), log(54), sin 5, sin(50). After some reformatting (e.g. change sin 5 into sin(5)), and doing an eval, PHP gives me the right result. However, this is quite unsafe. Can anyone point me in the right direction parsing and solving a math question like the examples above, which is safe? Thanks.
What open source (preferably gem-based) parser-generator options do I have in Ruby?
I've used (flex&bison)|(lex&yacc) from C in the past, and I'm comfortable with BNF-style specifications.
I've heard of treetop, but it looks a bit alien and verbose compared to yacc...
Purpose: I want to convert my text markup language to a BNF and generate the parsing code. I think it's a better strategy than my first-order solution: http://github.com/dafydd/semantictext/blob/master/lib/semantictext/rich_text_parser.rb
What open source (preferably gem-based) parser-generator options do I have in Ruby?
I've used (flex&bison)|(lex&yacc) from C in the past, and I'm comfortable with BNF-style specifications.
I've heard of treetop, but it looks a bit alien and verbose compared to yacc...
Purpose: I want to convert my text markup language to a BNF and generate the parsing code. I think it's a better strategy than my first-order solution: http://github.com/dafydd/semantictext/blob/master/lib/semantictext/rich_text_parser.rb
We are working on a hiring application and need the ability to easily parse resumes. Before trying to build one, was wondering what resume parsing tools are available out there and what is the best one, in your opinion? We need to be able to parse both Word and TXT files.
I haven't found many ways to increase the performance of a Java application that does intensive XML processing other than to leverage hardware such as Tarari or Datapower. Does anyone know of any open source ways to accelerate XML parsing?
Hi,
In the Android application I am building, I want to be able to communicate with a local server developed in Django. (Basically a login page and a home page populated with posts and images from users) So do I need to use XML Parsers for the parsing the response from a Django server or is it possible for the server to respond with strings which can be directly used? Also what about images?
Regards,
Primal
I'm having trouble with PHP text parsing
I have a txt file which has this kind of information:
sometext::sometext.0 = INTEGER: 254
What i need is to get the last value of 254 as variable in PHP.
in this txt file this last value can change from 0 to 255
"sometext::sometext.0 = INTEGER: " this part doesn't change at all.
It has a length of 36 symbols, so i need get with PHP what is after 36 symbol into variable.
Thank you.
I'd like to pass parameters to my C++ program in the following manner:
./myprog --setting=value
Are there any libraries which will help me to do this easily?
See also http://stackoverflow.com/questions/189972/argument-parsing-helpers-for-c-unix/191821
What's the best way of parsing the folowing rss feed item into a C# class.
<item>
<fh:FlightHistory FlightHistoryId="189895136" >
<fh:Airline AirlineCode="EI" Name="Aer Lingus" />
</fh:FlightHistory>
</item>
I have the following data inside an NSData object:
<00000000 6f2d840e 31504159 2e535953 2e444446 3031a51b 8801015f 2d02656e 9f110101 bf0c0cc5 0affff3f 00000003 ffff03
I'm having issues parsing this data. This data contains information which is marked by tags
Tag 1 is from byte value 0x84 to 0xa5
Tag 2 is from byte value 0xa5 to 0x88
Tag 3 is from byte value 0x88 to 0x5f0x2d
Tag 4 is from byte value 0x5f0x2d to 0x9f0x11
How would I go about to get those values from the NSData object?
Regards,
EZFrag
is there any faster way to parse a text than by walk each byte of the text?
I wonder if there is any special CPU (x86/x64) instruction for string operation that is used by string library, that somehow used to optimize the parsing routine.
for example instruction like finding a token in a string that could be run by hardware instead of looping each byte until a token is found.
I have a big XML which contains around 300 elements. I need to modify 2 or 3 elements in this xml using Java. I don't want to go for conventional marshalling and unmarshalling as it involves the parsing of the whole XML. How is XPath/XSLT manipulation? I know that I can easily read the data but i need to modify the same and put in back in the same XML. The primary concern here is performance. Kindly advise