web spidering/crawling, can i do it or just search engines?
- by bboyreason
i already had a question answered about web-scraping with wget. but as i read a little more, i realize i may be looking for a web-crawling program. particularly the part about web-crawlers being able to get specific data like links or, in my case, products.
all of the products on my site have the following naming convention, website.com/uniqueAlphaNumericID.html
as far as i know, no dynamic content generation is being used and only one page per one item in the above format.
should i just be thinking about:
wget website.com | grep *.html
or should i be looking into spiders/crawlers?