Web scraping: how to get scraper implementation from text link?

Posted by isme on Stack Overflow See other posts from Stack Overflow or by isme
Published on 2010-03-25T15:35:33Z Indexed on 2010/03/25 16:23 UTC
Read the original article Hit count: 385

Filed under:
|
|

I'm building a java web media-scraping application for extracting content from a variety of popular websites: youtube, facebook, rapidshare, and so on.

The application will include a search capability to find content urls, but should also allow the user to paste a url into the application if they already where the media is. Youtube Downloader already does this for a variety of video sites.

When the program is supplied with a URL, it decides which kind of scraper to use to get the content; for example, a youtube watch link returns a YoutubeScraper, a Facebook fanpage link returns a FacebookScraper and so on.

Should I use the factory pattern to do this?

My idea is that the factory has one public method. It takes a String argument representing a link, and returns a suitable implementation of the Scraper interface. I guess the Factory would hold a list of Scraper implementations, and would match the link against each Scraper until it finds a suitable one. If there is no suitable one, it throws an Exception instead.

© Stack Overflow or respective owner

Related posts about java

Related posts about screen-scraping