Why does my Perl regular expression only find the last occurrence?

Posted by scharan on Stack Overflow See other posts from Stack Overflow or by scharan
Published on 2010-03-18T07:19:06Z Indexed on 2010/03/20 4:51 UTC
Read the original article Hit count: 280

Filed under:
|

I have the following input to a Perl script and I wish to get the first occurrence of NAME="..." strings in each of the <table>...</table> structures.

The entire file is read into a single string and the regex acts on that input.

However, the regex always returns the last occurrence of NAME="..." strings. Can anyone explain what is going on and how this can be fixed?

Input file: 
ADSDF
<TABLE>
NAME="ORDERSAA"
line1
line2
NAME="ORDERSA"
line3
NAME="ORDERSAB"
</TABLE>
<TABLE>
line1
line2
NAME="ORDERSB"
line3
</TABLE>
<TABLE>
line1
line2
NAME="ORDERSC"
line3
</TABLE>
<TABLE>
line1
line2
NAME="ORDERSD"
line3
line3
line3
</TABLE>
<TABLE>
line1
line2
NAME="QUOTES2"
line3
NAME="QUOTES3"
NAME="QUOTES4"
line3
NAME="QUOTES5"
line3
</TABLE>
<TABLE>
line1
line2
NAME="QUOTES6"
NAME="QUOTES7"
NAME="QUOTES8"
NAME="QUOTES9"
line3
line3
</TABLE>
<TABLE>
NAME="MyName IsKhan"
</TABLE>

Perl Code starts here:

use warnings;
use strict;

my $nameRegExp = '(<table>((NAME="(.+)")|(.*|\n))*</table>)';

sub extractNames($$){
 my ($ifh, $ofh) = @_;
 my $fullFile;
 read ($ifh, $fullFile, 1024);#Hardcoded to read just 1024 bytes.
 while( $fullFile =~ m#$nameRegExp#gi){
  print "found: ".$4."\n";
 }
}

sub main(){
 if( ($#ARGV + 1 )!= 1){
  die("Usage: extractNames infile\n");
 }
 my $infileName = $ARGV[0];
 my $outfileName = $ARGV[1];
 open my $inFile, "<$infileName" or die("Could not open log file $infileName");
 my $outFile;
 #open my $outFile, ">$outfileName" or die("Could not open log file $outfileName");
 extractNames( $inFile, $outFile );
 close( $inFile );
 #close( $outFile );
}

#call 
main();

© Stack Overflow or respective owner

Related posts about regex

Related posts about perl