Need help with regex parsing (in perl)
Posted
by Charlie
on Stack Overflow
See other posts from Stack Overflow
or by Charlie
Published on 2010-05-02T03:14:01Z
Indexed on
2010/05/02
3:17 UTC
Read the original article
Hit count: 248
Hi all, need some help parsing an html file in perl.
I used the LWP module to retrieve a webpage into $_ with $/ undefined so there are no newline issues. Then I'm trying to find all strings matching a pattern. How do I do that? I know how to find 1 instance of it, but how do I match all instances? and what data structure would the results go to? a multi dimensional array?
my text (excerpt) looks like the following:
<TR>
<TD BGCOLOR=EEEEEE><A HREF="/program.cgi?pid=1233"><FONT FACE="ARIAL,HELVETICA,SANS-SERIF" SIZE=2>Title 1</A></FONT></TD>
<TD BGCOLOR=EEEEEE nowrap><FONT FACE="ARIAL,HELVETICA" SIZE=2>Jun 27 2010 3:00PM</FONT></TD>
<TD BGCOLOR=EEEEEE> </TD>
</TR>
<TR><TD BGCOLOR=EEEEEE COLSPAN=3><IMG SRC="http://images.domain.com/images/spacer.gif" WIDTH=1 HEIGHT=2><BR></TD></TR>
<TR><TD COLSPAN=3 BGCOLOR=999999><IMG SRC="http://images.domain.com/images/spacer.gif" HEIGHT=1 WIDTH=1></TD></TR>
<TR><TD COLSPAN=3 ><IMG SRC="http://images.domain.com/images/spacer.gif" WIDTH=1 HEIGHT=2><BR></TD></TR>
<TR>
<TD><A HREF="/program.cgi?pid=1234"><FONT FACE="ARIAL,HELVETICA,SANS-SERIF" SIZE=2>Title 2</A></FONT></TD>
<TD nowrap><FONT FACE="ARIAL,HELVETICA" SIZE=2>Jun 29 2010 7:00PM</FONT></TD>
<TD> </TD>
</TR>
<TR><TD COLSPAN=3><IMG SRC="http://images.domain.com/images/spacer.gif" WIDTH=1 HEIGHT=2><BR></TD></TR>
<TR><TD COLSPAN=3 BGCOLOR=999999><IMG SRC="http://images.domain.com/images/spacer.gif" HEIGHT=1 WIDTH=1></TD></TR>
<TR><TD COLSPAN=3 BGCOLOR=EEEEEE><IMG SRC="http://images.domain.com/images/spacer.gif" WIDTH=1 HEIGHT=2><BR></TD></TR>
<TR>
<TD BGCOLOR=EEEEEE><A HREF="/program.cgi?pid=1235"><FONT FACE="ARIAL,HELVETICA,SANS-SERIF" SIZE=2>Title 3</A></FONT></TD>
<TD BGCOLOR=EEEEEE nowrap><FONT FACE="ARIAL,HELVETICA" SIZE=2>Jul 3 2010 7:00PM</FONT></TD>
<TD BGCOLOR=EEEEEE> </TD>
</TR>
I want to get the following into an array (or any structure):
{ ["/program.cgi?pdi=1233", "Title 1"], ["/program.cgi?pdi=1234", "Title 2"], ["/program.cgi?pdi=1235", "Title 3"] }
Thanks
© Stack Overflow or respective owner