Regular expression does not find the first occurance

Posted by scharan on Stack Overflow See other posts from Stack Overflow or by scharan
Published on 2010-03-18T07:19:06Z Indexed on 2010/03/18 7:21 UTC
Read the original article Hit count: 298

Filed under:
|

I have the following input to a perl script and I wish to get the first occurrence of NAME="..." strings in each of the ... structures.

The entire file is read into a single string and the reg exp acts on that input.

However, the regex always returns the LAST occurrence of NAME="..." strings. Can anyone explain what is going on and how this can be fixed?

Input file: 
ADSDF
<TABLE>
NAME="ORDERSAA"
line1
line2
NAME="ORDERSA"
line3
NAME="ORDERSAB"
</TABLE>
<TABLE>
line1
line2
NAME="ORDERSB"
line3
</TABLE>
<TABLE>
line1
line2
NAME="ORDERSC"
line3
</TABLE>
<TABLE>
line1
line2
NAME="ORDERSD"
line3
line3
line3
</TABLE>
<TABLE>
line1
line2
NAME="QUOTES2"
line3
NAME="QUOTES3"
NAME="QUOTES4"
line3
NAME="QUOTES5"
line3
</TABLE>
<TABLE>
line1
line2
NAME="QUOTES6"
NAME="QUOTES7"
NAME="QUOTES8"
NAME="QUOTES9"
line3
line3
</TABLE>
<TABLE>
NAME="MyName IsKhan"
</TABLE>

Perl Code starts here: use warnings; use strict;

my $nameRegExp = '(<table>((NAME="(.+)")|(.*|\n))*</table>)';

sub extractNames($$){
    my ($ifh, $ofh) = @_;
    my $fullFile;
    read ($ifh, $fullFile, 1024);#Hardcoded to read just 1024 bytes.
    while( $fullFile =~ m#$nameRegExp#gi){
        print "found: ".$4."\n";
    }
}

sub main(){
    if( ($#ARGV + 1 )!= 1){
        die("Usage: extractNames infile\n");
    }
    my $infileName = $ARGV[0];
    my $outfileName = $ARGV[1];
    open my $inFile, "<$infileName" or die("Could not open log file $infileName");
    my $outFile;
    #open my $outFile, ">$outfileName" or die("Could not open log file $outfileName");
    extractNames( $inFile, $outFile );
    close( $inFile );
    #close( $outFile );
}

#call 
main();

© Stack Overflow or respective owner

Related posts about regex

Related posts about perl