Wget site mirror, links with rel="<content>" not followed
Posted
by
Pacifika
on Super User
See other posts from Super User
or by Pacifika
Published on 2012-03-23T10:11:12Z
Indexed on
2012/03/23
11:32 UTC
Read the original article
Hit count: 288
wget
Whilst creating a site mirror using wget 1.12 on Ubuntu links with a rel
attribute set are not downloaded:
<a href="link" rel="tag">text</a>
Rel="tag" is a microformat (By adding rel="tag"
to a hyperlink, a page indicates that the destination of that hyperlink is an author-designated "tag" (or keyword/subject) for the current page).
My WordPress theme uses this for link to tags, so 99% of the site is ignored.
Edit: it turns out all my permalinks use rel="bookmark"
and are skipped as well.
I'm using the following wget command (this ignores robots.txt and also follows nofollow links):
wget -mkp -e robots=off http://site
How do I make wget follow links with rel
set?
© Super User or respective owner