How to parse HTML with TouchXML or some other alternative.
Posted
by
0SX
on Stack Overflow
See other posts from Stack Overflow
or by 0SX
Published on 2010-12-19T17:22:53Z
Indexed on
2010/12/21
5:00 UTC
Read the original article
Hit count: 409
Hi, I'm trying to parse the HTML presented below with TouchXML but it keeps crashing when I try to extract certain attributes. I'm totally new to the parser world so I apologize for being a complete idiot. I need help to parse this HTML. What I'm trying to accomplish is to parse each attribute and value or what not and copy them to a string. I've been trying to find a good parser to parse HTML and I believe TouchXML is the best I've seen because of Tidy. Speaking of Tidy, How could I run this HTML through Tidy first then parse it? I'm not sure how to do this. Here is the code that I have so far that doesn't work due to it's not pulling everything I need from the HTML. Any help or advice would be much appreciated. Thanks
My current code:
NSMutableArray *res = [[NSMutableArray alloc] init];
// using local resource file
NSString *XMLPath = [[[NSBundle mainBundle] resourcePath] stringByAppendingPathComponent:@"example.html"];
NSData *XMLData = [NSData dataWithContentsOfFile:XMLPath];
CXMLDocument *doc = [[[CXMLDocument alloc] initWithData:XMLData options:0 error:nil] autorelease];
NSArray *nodes = NULL;
nodes = [doc nodesForXPath:@"//div" error:nil];
for (CXMLElement *node in nodes) {
NSMutableDictionary *item = [[NSMutableDictionary alloc] init];
[item setObject:[[node attributeForName:@"id"] stringValue] forKey:@"id"];
[res addObject:item];
[item release];
}
NSLog(@"%@", res);
[res release];
HTML file that needs to be parsed:
<html>
<head>
<base target="_blank" />
</head>
<body style="margin:2;">
<div id="group">
<div id="groupURL"><a href="http://www.example.com/groups">Group URL</a></div>
<img id="grouplogo" src="http://images.example.com/groups/image.png" />
<div id="groupcomputer"><a href="http://www.example.com/groups/page" title="Group Title">Group title this would be here</a></div>
<div id="groupinfos">
<div id="groupinfo-l">Person</div><div id="groupinfo-r">Ralph</div>
<div id="groupinfo-l">Years</div><div id="groupinfo-r">4 years</div>
<div id="groupinfo-l">Salary</div><div id="groupinfo-r">100K</div>
<div id="groupinfo-l">Other</div><div id="groupoth" style="width:15px">other info</div>
</body>
</html>
EDIT: I could use Element Parser but I need to know how to extract the Person's Name from the following example which would be Ralph in this case.
<div id="groupinfo-l">Person</div><div id="groupinfo-r">Ralph</div>
© Stack Overflow or respective owner