Hello,
I was wondering if anyone ever tried to extract/follow RSS links using
SgmlLinkExtractor/CrawlSpider. I can't get it to work...
I am using the following rule:
rules = (
Rule(SgmlLinkExtractor(tags=('link',), attrs=False),
follow=True,
callback='parse_article'),
)
(having in mind that rss links are located in the link tag).
I am not sure how to tell SgmlLinkExtractor to extract the text() of
the link and not to search the attributes ...
Any help is welcome,
Thanks in advance