Screen Scraping HTML with C#
Posted
by
WildBill
on Stack Overflow
See other posts from Stack Overflow
or by WildBill
Published on 2011-01-03T19:46:04Z
Indexed on
2011/01/03
19:54 UTC
Read the original article
Hit count: 346
c#
|screen-scraping
I have been given the task at work of screen scraping one of our legacy web apps to extract certain data from the code. The data is formatted and "should" be displayed exactly the same every time. I am just not sure how to go about doing this. It's a full html file with header and footer navigations but in the middle of all this is the data I need.
I need to extract the Company Name value, Contact Name, Telephone, email address, etc.
Here is an example of what the code looks like:
...html above here
<br /><br />
<table cellpadding="0" cellspacing="12" border="0">
<tr>
<td valign="top" align="center">
<!-- Company Info -->
<table cellpadding="0" cellspacing="0" border="0">
<tr>
<td class="black">
<table cellspacing="1" cellpadding="0" border="0" width="370">
<tr>
<th>ABC INDUSTRIES</th>
</tr>
<tr>
<td class="search">
<table cellpadding="5" cellspacing="0" border="0" width="100%">
<tr>
<td>
<table cellpadding="1" cellspacing="0" border="0" width="100%">
<tr>
<td align="center" colspan="2"><hr></td>
</tr>
<tr>
<td align="right" nowrap><b><font color="FF0000">Contact Person <img src="/images/icon_contact.gif" align="absmiddle"> :</font></b></td>
<td align="left" width="100%"> Joe Smith</td>
</tr>
<tr>
<td align="right" nowrap><b><font color="FF0000">Phone Number <img src="/images/icon_phone.gif" align="absmiddle"> :</font></b></td>
<td align="left" width="100%"> 555-555-5555</td>
</tr>
<tr>
<td align="right" nowrap><b><font color="FF0000">E-mail Address <img src="/images/icon_email.gif" align="absmiddle"> :</font></b></td>
<td align="left" width="100%"> <a HREF="mailto:[email protected]">[email protected]</a></td>
</tr>
more...
There is more code on the screen in a different table structure that I also need to pull.
© Stack Overflow or respective owner