Why does my Perl regular expression only find the last occurrence?
Posted
by scharan
on Stack Overflow
See other posts from Stack Overflow
or by scharan
Published on 2010-03-18T07:19:06Z
Indexed on
2010/03/20
4:51 UTC
Read the original article
Hit count: 277
I have the following input to a Perl script and I wish to get the first occurrence of NAME="..." strings in each of the <table>...</table>
structures.
The entire file is read into a single string and the regex acts on that input.
However, the regex always returns the last occurrence of NAME="..."
strings. Can anyone explain what is going on and how this can be fixed?
Input file:
ADSDF
<TABLE>
NAME="ORDERSAA"
line1
line2
NAME="ORDERSA"
line3
NAME="ORDERSAB"
</TABLE>
<TABLE>
line1
line2
NAME="ORDERSB"
line3
</TABLE>
<TABLE>
line1
line2
NAME="ORDERSC"
line3
</TABLE>
<TABLE>
line1
line2
NAME="ORDERSD"
line3
line3
line3
</TABLE>
<TABLE>
line1
line2
NAME="QUOTES2"
line3
NAME="QUOTES3"
NAME="QUOTES4"
line3
NAME="QUOTES5"
line3
</TABLE>
<TABLE>
line1
line2
NAME="QUOTES6"
NAME="QUOTES7"
NAME="QUOTES8"
NAME="QUOTES9"
line3
line3
</TABLE>
<TABLE>
NAME="MyName IsKhan"
</TABLE>
Perl Code starts here:
use warnings;
use strict;
my $nameRegExp = '(<table>((NAME="(.+)")|(.*|\n))*</table>)';
sub extractNames($$){
my ($ifh, $ofh) = @_;
my $fullFile;
read ($ifh, $fullFile, 1024);#Hardcoded to read just 1024 bytes.
while( $fullFile =~ m#$nameRegExp#gi){
print "found: ".$4."\n";
}
}
sub main(){
if( ($#ARGV + 1 )!= 1){
die("Usage: extractNames infile\n");
}
my $infileName = $ARGV[0];
my $outfileName = $ARGV[1];
open my $inFile, "<$infileName" or die("Could not open log file $infileName");
my $outFile;
#open my $outFile, ">$outfileName" or die("Could not open log file $outfileName");
extractNames( $inFile, $outFile );
close( $inFile );
#close( $outFile );
}
#call
main();
© Stack Overflow or respective owner