WGet or cURL: Mirror Site from http://site.com And No Internal Access

Posted by alharaka on Server Fault See other posts from Server Fault or by alharaka
Published on 2011-02-10T22:12:30Z Indexed on 2012/04/12 5:33 UTC
Read the original article Hit count: 547

Filed under:

wget

|

curl

I have tried wget -m wget -r and a whole bunch of variations. I am getting some of the images on http://site.com, one of the scripts, and none of the CSS, even with the fscking -p parameter. The only HTML page is index.html and there are several more referenced, so I am at a loss. curlmirror.pl on the cURL developers website does not seem to get the job done either. Is there something I am missing? I have tried different levels of recursion with only this URL, but I get the feeling I am missing something. Long story short, some school allows its students to submit web projects, but they want to know how they can collect everything for the instructor who will grade it, instead of him going to all the externally hsoted sites.

UPDATE: I think I figured out the issue. I though the links to the other pages were in the index.html page that downloaded. I was way off. Turns out the footer of the page, which has all the navigation links, is handled by a JavaScript file Include.js, which reads JLSSiteMap.js and some other JS files to do page navigation and the like. As a result, wget does not pick up an other dependencies because a lot of this crap is handled not on web pages. How can I handle such a website? This is one of several problem cases. I assume little can be done if wget cannot parse JavaScript.

© Server Fault or respective owner

Related posts about wget

Make wget not download files larger than X size

as seen on Super User - Search for 'Super User'
Okay, I give up. How do I size limit which files are downloaded, like say I don't want any files bigger than 2 MB? >>> More
How to start using Wget?

as seen on Super User - Search for 'Super User'
Please, forgive me for asking this question. Usually I would try to learn thisngs myself first before bothering others, but my situation is urgent - if I don't act now and don't download all my family pictures from this website, it will be closed in about two weeks from now and I will loose all of… >>> More
wget mirroring, subdomains and directories and cookies

as seen on Server Fault - Search for 'Server Fault'
Hi all, I have an account on a web page that is now "full" (ie I have used up all my allocated space) and I would like to make a mirror of that site. wget seems like the thing to use. The problem is that I would only like to mirror the sites the lie within this directory http://user.domain.com/room/2324343/transcript/… >>> More
How can I install things in Linux with *no yum* and *no wget*?

as seen on Super User - Search for 'Super User'
I'm a newbie to Linux (that mainly uses Windows and Mac OS X) needing some advice. I was trying to install git on a Linux machine today, and encountered some problems: Not knowing the version of the installed OS, I've opened the /proc/version file which said: Linux version 2.6.9-42.0.2.ELsmp (bhcompile@ls20-bc1-13… >>> More
Getting wget to dowload only files with specific name patterns

as seen on Stack Overflow - Search for 'Stack Overflow'
I want to use wget to DL some files. I want to DL only files whose name that fit a certain pattern, e.g. ???.txt and not any other *.txt files. Can this be done with wget? I could only find a way to --accept/--reject files based on the extension. Thanks! >>> More

Related posts about curl

iPhone Curl Left and Curl Right transitions

as seen on Stack Overflow - Search for 'Stack Overflow'
I am looking for a way to do a UIViewAnimationTransitionCurlUp or UIViewAnimationTransitionCurlDown transition on the iPhone but instead of top to bottom, do it from the left to right (or top/bottom in landscape mode). I've seen this asked aroud the internet a few times but none sems to get an answer… >>> More
PHP Curl and Curl

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi , I am able to send a get request using PHP Curl . But the same thing when i try from command line in Linux (/usr/bin/curl ) I am unable to do so. Please find below my PHP curl that is working $url = "http://172.20.22.26"; $headers = array("Host: 172.20.22.26", "User-Agent:… >>> More
php, curl , php curl , multipart/form-data , upload picture redirect

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm trying to upload some pictures using php cURL on a classified ad website .I think that I set all the parameters properly but I see that there is a kind of redirect after I post the picture . The issue is that the url where I'm getting redirected gives 404 error instead to return the html that… >>> More
Allow Incoming Responses from Curl On Ubuntu 11.10 - Curl

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I'm trying to get a Curl Response from an outside server, however I noticed I cant neither PING the server in question nor connect to it. I tried disabling the iptables firewall but I had no success. My server is running behind a Cisco Linksys WRTN310N Router with the DD-wrt firmware Installed. In… >>> More
cURL works but PHP cURL fails to internet [migrated]

as seen on Pro Webmasters - Search for 'Pro Webmasters'
Trying to diagnose an issue using PHP to cURL to an Internet location on a RedHat Linux server. cURL is installed and working, and: <?php var_dump(curl_version()); ?> shows all the correct information in the output. The issue is I can use PHP to cURL to localhost on the box itself, but… >>> More