Copy a website and preserve the file & folder structure
Posted
by DrStalker
on Server Fault
See other posts from Server Fault
or by DrStalker
Published on 2010-06-11T08:40:01Z
Indexed on
2010/06/11
8:43 UTC
Read the original article
Hit count: 325
I have an old web site running on an ancient version of Oracle Portal that we need to convert to a flat-html structure. Due to damage to the server we are not able to access the administrative interface, and even if we could there is no export functionality that can work with modern software versions.
It would be enough to crawl the website and have all the pages & images saved to a folder, but the file structure needs to be preserved; that is, if a page is located at http://www.oldserver.com/foo/bar/baz/mypage.html then it needs to be saved to /foo/bar/baz/mypage.html so that the various Javascript bits will continue to function.
None of the web crawlers I've found have been able to do this; they all want to rename the pages (page01.html, page02.html etc) and break the folder structure.
Is there any crawler out there that will recreate the site structure as it appears to a user accessing the site? It doesn't need to redo any of teh content of the pages; once rehosted the pages will all have the same names they did originally so links will continue to work.
© Server Fault or respective owner