Which web crawler to use to save news articles from a website into .txt files?
Posted
by brokencoding
on Stack Overflow
See other posts from Stack Overflow
or by brokencoding
Published on 2010-02-19T15:46:09Z
Indexed on
2010/05/11
13:04 UTC
Read the original article
Hit count: 313
Hi, i am currently in dire need of news articles to test a LSI implementation (it's in a foreign language, so there isnt the usual packs of files ready to use).
So i need a crawler that given a starting url, let's say http://news.bbc.co.uk/ follows all the contained links and saves their content into .txt files, if we could specify the format to be UTF8 i would be in heaven.
I have 0 expertise in this area, so i beg you for some sugestions in which crawler to use for this task.
© Stack Overflow or respective owner