How to build a web crawler to find a specific advert, which is in an iframe loaded by Javascript
Posted
by ZoFreX
on Stack Overflow
See other posts from Stack Overflow
or by ZoFreX
Published on 2010-02-26T20:37:10Z
Indexed on
2010/03/14
2:05 UTC
Read the original article
Hit count: 320
I'm trying to find all instances of an advert on a website. The advert is in an iframe which is loaded by javascript (it doesn't appear at all if javascript is turned off). Detecting the advert itself is extremely simple, both the name of the flash file and the target of the href always contain a certain string.
What would be the best "starting point" for achieving this? At the moment I'm considering an Adobe AIR app, which could crawl the site and examine the DOM to find the ad, and would run javascript and load the content of the iframe. The other option I can think of is using Firefox as the platform (using maybe GreaseMonkey or Selenium? I don't really know how to leverage Firefox like this).
Does anyone know of anything suitable to build this, or have any suggestions on using Firefox to do it?
Extra details:
Being CPU intensive isn't really an issue, nor is anything depending on a browser being open. This doesn't need to run on a headless server, it will be running on a powerful desktop box. OS is also not an issue. It would be advantageous if the crawler loaded each page multiple times, as the advert is in rotation. While the crawler does need to execute the javascript and load the content of the iframe, it does not need to be able to display flash files.
© Stack Overflow or respective owner