Wget site mirror, links with rel="<content>" not followed
- by Pacifika
Whilst creating a site mirror using wget 1.12 on Ubuntu links with a rel attribute set are not downloaded:
<a href="link" rel="tag">text</a>
Rel="tag" is a microformat (By adding rel="tag" to a hyperlink, a page indicates that the destination of that hyperlink is an author-designated "tag" (or keyword/subject) for the current page).
My WordPress theme uses this for link to tags, so 99% of the site is ignored.
Edit: it turns out all my permalinks use rel="bookmark" and are skipped as well.
I'm using the following wget command (this ignores robots.txt and also follows nofollow links):
wget -mkp -e robots=off http://site
How do I make wget follow links with rel set?