Improve a regex statement in order to be as efficient as it can be
Posted
by
user551625
on Stack Overflow
See other posts from Stack Overflow
or by user551625
Published on 2010-12-22T19:41:33Z
Indexed on
2010/12/22
21:54 UTC
Read the original article
Hit count: 214
I have a PHP program that, at some point, needs to analyze a big amount of HTML+javascript text to parse info. All I want to parse needs to be in two parts.
- Seperate all "HTML goups" to parse
- Parse each HTML group to get the needed information.
In the 1st parse it needs to find:
<div id="myHome"
And start capturing after that tag. Then stop capturing before
<span id="nReaders"
And capture the number that comes after this tag and stop.
In the 2nd parse use the capture nº 1 (0 has the whole thing and 2 has the number) from the parse made before and then find .
I already have code to do that and it works. Is there a way to improve this, make it easier for the machine to parse?
preg_match_all('%<div id="myHome"[^>]>(.*?)<span id="nReaders[^>]>([0-9]+)<"%msi', $data, $results, PREG_SET_ORDER);
foreach($results AS $result){
preg_match_all('%<div class="myplacement".*?[.]php[?]((?:next|before))=([0-9]+).*?<tbody.*?<td[^>]>.*?[0-9]+"%msi', $result[1], $mydata, PREG_SET_ORDER);
//takes care of the data and finish the program
Note: I need this for a freeware program so it must be as general as possible and, if possible, not use php extensions
ADD: I ommitted some parts here because I didn't expect for answers like those. There is also a need to parse text inside one of the tags that is in the document. It may be the 6th 7th or 8th tag but I know it is after a certain tag. The parser I've checked (thx profitphp) does work to find the script tag. What now? There are more than 1 tag with the same class. I want them all. But I want only with also one of a list of classes..... Where can I find instructions and demos and limitations of DOM parsers (like the one in http://simplehtmldom.sourceforge.net/)? I need something that will work on, at least, a big amount of free servers.
© Stack Overflow or respective owner