I have been given
the task at work of
screen scraping one of our legacy web apps
to extract certain data from
the code.
The data is formatted and "should" be displayed exactly
the same every time. I am just not sure
how to go about doing this. It's a full html file with header and footer navigations but in
the middle of all this is
the data I need.
I need
to extract
the Company Name value, Contact Name, Telephone, email address, etc.
Here is an example of what
the code looks like:
...html above here
<br /><br />
<table cellpadding="0" cellspacing="12" border="0">
<tr>
<td valign="top" align="center">
<!-- Company Info -->
<table cellpadding="0" cellspacing="0" border="0">
<tr>
<td class="black">
<table cellspacing="1" cellpadding="0" border="0" width="370">
<tr>
<th>ABC INDUSTRIES</th>
</tr>
<tr>
<td class="search">
<table cellpadding="5" cellspacing="0" border="0" width="100%">
<tr>
<td>
<table cellpadding="1" cellspacing="0" border="0" width="100%">
<tr>
<td align="center" colspan="2"><hr></td>
</tr>
<tr>
<td align="right" nowrap><b><font color="FF0000">Contact Person <img src="/images/icon_contact.gif" align="absmiddle"> :</font></b></td>
<td align="left" width="100%"> Joe Smith</td>
</tr>
<tr>
<td align="right" nowrap><b><font color="FF0000">Phone Number <img src="/images/icon_phone.gif" align="absmiddle"> :</font></b></td>
<td align="left" width="100%"> 555-555-5555</td>
</tr>
<tr>
<td align="right" nowrap><b><font color="FF0000">E-mail Address <img src="/images/icon_email.gif" align="absmiddle"> :</font></b></td>
<td align="left" width="100%"> <a HREF="mailto:
[email protected]">
[email protected]</a></td>
</tr>
more...
There is more code on
the screen in a different table structure that I also need
to pull.