Wget site mirror, links with rel="<content>" not followed

Posted by Pacifika on Super User See other posts from Super User or by Pacifika
Published on 2012-03-23T10:11:12Z Indexed on 2012/03/23 11:32 UTC
Read the original article Hit count: 288

Filed under:

Whilst creating a site mirror using wget 1.12 on Ubuntu links with a rel attribute set are not downloaded:

 <a href="link" rel="tag">text</a>

Rel="tag" is a microformat (By adding rel="tag" to a hyperlink, a page indicates that the destination of that hyperlink is an author-designated "tag" (or keyword/subject) for the current page).

My WordPress theme uses this for link to tags, so 99% of the site is ignored.

Edit: it turns out all my permalinks use rel="bookmark" and are skipped as well.

I'm using the following wget command (this ignores robots.txt and also follows nofollow links):

wget -mkp -e robots=off http://site

How do I make wget follow links with rel set?

© Super User or respective owner

Related posts about wget