Improve a regex statement in order to be as efficient as it can be
- by user551625
I have a PHP program that, at some point, needs to analyze a big amount of HTML+javascript text to parse info.
All I want to parse needs to be in two parts.
Seperate all "HTML goups" to parse
Parse each HTML group to get the needed information.
In the 1st parse it needs to find:
<div id="myHome"
And start capturing after that tag. Then stop capturing before
<span id="nReaders"
And capture the number that comes after this tag and stop.
In the 2nd parse use the capture nº 1 (0 has the whole thing and 2 has the number) from the parse made before and then find
.
I already have code to do that and it works. Is there a way to improve this, make it easier for the machine to parse?
preg_match_all('%<div id="myHome"[^>]>(.*?)<span id="nReaders[^>]>([0-9]+)<"%msi', $data, $results, PREG_SET_ORDER);
foreach($results AS $result){
preg_match_all('%<div class="myplacement".*?[.]php[?]((?:next|before))=([0-9]+).*?<tbody.*?<td[^>]>.*?[0-9]+"%msi', $result[1], $mydata, PREG_SET_ORDER);
//takes care of the data and finish the program
Note: I need this for a freeware program so it must be as general as possible and, if possible, not use php extensions
ADD:
I ommitted some parts here because I didn't expect for answers like those.
There is also a need to parse text inside one of the tags that is in the document. It may be the 6th 7th or 8th tag but I know it is after a certain tag. The parser I've checked (thx profitphp) does work to find the script tag. What now?
There are more than 1 tag with the same class. I want them all. But I want only with also one of a list of classes.....
Where can I find instructions and demos and limitations of DOM parsers (like the one in http://simplehtmldom.sourceforge.net/)? I need something that will work on, at least, a big amount of free servers.