Search Results

Search found 593 results on 24 pages for 'wget'.

Page 1/24 | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • wget has a 4 second delay

    - by guisius
    Hello. I have tried to wget a page with windows/mac, and the response is instant while the linux vesion needs to wait for 4 seconds before it shows the response. I just hope this can be solved. More information added: in Ubuntu : wget xxx://192.168.0.135/test.cgi?cmd= -O test.txt --2011-03-04 14:21:17-- xxx://192.168.0.135/test.cgi?cmd= Connecting to 192.168.0.135:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: `test.txt' [ <=> ] 17 --.-K/s in 0s 2011-03-04 14:21:22 (1.88 MB/s) - `test.txt' saved [17] while in Mac OS : wget xxx://192.168.0.135/test.cgi?cmd= -O test.txt --2011-03-04 14:22:33-- xxx://192.168.0.135/test.cgi?cmd= Connecting to 192.168.0.135:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: `test.txt' [ <=> ] 17 --.-K/s in 0s 2011-03-04 14:22:33 (755 KB/s) - `test.txt' saved [17] in ubuntu it delays 4 seconds while windows and mac will not i believe it may related to some setting in the network config such as packet size , window frame , but i have no idea to set this PS: because the limit of the post not allow to post the url so i mark this as xxx

    Read the article

  • Exclude list of specific files in wget

    - by nanker
    I am trying to download a lot of pages from a website on dial-up and it can be brutally slow. I have almost got the perfect wget command, but because I'm downloading pages from the same site wget wastes times downloading the same standard images for each page. If I know the name of the default page images, is there any way to have wget ignore and thus avoid downloading those for each and every page? Here is an example of one of the wget commands that my shell script generates into another shell script to download all of the pages: mkdir candy-canes-on-the-flannel-board-in-preschool cd candy-canes-on-the-flannel-board-in-preschool wget -p -nd -A jpg,html -k http://www.teachpreschool.org/2011/12/candy-canes-on-the-flannel-board-in-preschool/ wget -c --random-wait --timeout=30 --user-agent="Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.3) Gecko/2008092416 Firefox/3.0.3" http://www.teachpreschool.org/2011/12/candy-canes-on-the-flannel-board-in-preschool/ -O "candy-canes-on-the-flannel-board-in-preschool" rm Baby-and-Toddler.jpg Childrens-Books.jpg Creative-Art.jpg Felt-Fun.jpg Happy_Rainbow-e1338766526528.jpg index.html Language-and-Literacy.jpg Light-table-Button.jpg Math.jpg Outdoor-Play.jpg outer-jacket1-300x153.jpg preschoolspot-button-small.jpg robots.txt Science-and-Nature.jpg Signature-2.jpg Story-Telling.jpg Tags-on-Preschool.jpg Teaching-Two-and-Three-Year-olds.jpg cd ../ Now I realize the script is not likely as savvy as it could be but it is doing what I need at the moment except that you can see from the rm command that I would just like to prevent wget from downloading the files in the first place if possible. I almost forgot to mention, there are two wget commands and that is because the first one downloads the page as index.html and for some reason it does not open in my browser, however, when I open it and look at it in vim all of the page's content is there, so I am not sure why it does not open. But if I just issue the second wget command as it is then that page, same file really with an alternate name, opens up fine. Something that if I could fix would also help to streamline the process.

    Read the article

  • How to download all file content from a folder using wget and http

    - by user1526912
    I am trying to use wget and http to download all contents from folderAA below to directory /root/sstest wget -r --directory-prefix="/root/sstest" -o /root/sstest2.log http://site.com/folder1/folder2/folderAA/ When I submit the above command nothing is downloaded. If I submit a wget request for a specific file from folderAA the file is actually downloaded to /root/sstest: wget -r --directory-prefix="/root/sstest" -o /root/sstest2.log http://site.com/folder1/folder2/folderAA/file.txt Can someone tell me why I cannot download all file content from folderAA at once using the first wget request?

    Read the article

  • Using wget to download pdf files from a site that requires cookies to be set

    - by matt74tm
    I want to access a newspaper site and then download their epaper copies (in PDF). The site requires me to login using my email address and password and then it permits me to access those PDF URLs. I'm having trouble 'setting my session' in wget. When I login into the site from my browser, it sets two cookie values: [email protected] Password=12345 I tried: wget --post-data "[email protected]&Password=12345" http://epaper.abc.com/login.aspx However, that just downloaded the login page and saved it locally The FORM on the login page has two fields: txtUserID txtPassword and radiobuttons like this: <input id="rbtnManchester" type="radio" checked="checked" name="txtpub" value="44"> Another button: <input id="rbtnLondon" type="radio" name="txtpub" value="64"> If I post this to the login.aspx page, I get the same output wget --post-data "[email protected]&txtPassword=12345&txtpub=44" http://epaper.abc.com/login.aspx If I do: --save-cookies abc_cookies.txt it doesnt seem to have anything other than the default content. For the last if I do --debug as well it says: ... Set-Cookie: ASP.NET_SessionId=05kphcn4hjmblq45qgnjoe41; path=/; HttpOnly ... Stored cookie epaper.abc.com -1 (ANY) / <session> <insecure> [expiry none] ASP.NET_SessionId 05kphcn4hjmblq45qgnjoe41 Length: 107253 (105K) [text/html] Saving to: `login.aspx' ... Saving cookies to abc_cookies.txt. However, abc_cookies.txt shows ONLY the following: # HTTP cookie file. # Generated by Wget on 2011-08-16 08:03:05. # Edit at your own risk. (Not sure why I'm not getting any responses on SO - perhaps SU is a better forum - http://stackoverflow.com/questions/7064171/using-wget-to-download-pdf-files-from-a-site-that-requires-cookies-to-be-set) EDIT 1 C:\Temp>wget --cookies=on --keep-session-cookies --save-cookies abc_cookies.txt --post-data "txtUserID=abc%40gmail.com&txtPassword=password&txtpub=44&chkbox=checkbox&submit.x=48&submit.y=7" http://epaper.abc.com/login.aspx --debug SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc syswgetrc = C:\Program Files (x86)\GnuWin32/etc/wgetrc DEBUG output created by Wget 1.11.4 on Windows-MinGW. --2011-08-18 08:15:59-- http://epaper.abc.com/login.aspx Resolving epaper.abc.com... seconds 0.00, 999.999.99.99 Caching epaper.abc.com => 999.999.99.99 Connecting to epaper.abc.com|999.999.99.99|:80... seconds 0.00, connected. Created socket 300. Releasing 0x00a2ae80 (new refcount 1). ---request begin--- POST /login.aspx HTTP/1.0 User-Agent: Wget/1.11.4 Accept: */* Host: epaper.abc.com Connection: Keep-Alive Content-Type: application/x-www-form-urlencoded Content-Length: 100 ---request end--- [POST data: txtUserID=abc%40gmail.com&txtPassword=password&txtpub=44&chkbox=checkbox&submit.x=48&submit.y=7] HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 200 OK Connection: keep-alive Date: Thu, 18 Aug 2011 02:46:17 GMT Server: Microsoft-IIS/6.0 X-Powered-By: ASP.NET X-AspNet-Version: 2.0.50727 Set-Cookie: ASP.NET_SessionId=owcrje55yl45kgmhn43gq145; path=/; HttpOnly Cache-Control: private Content-Type: text/html; charset=utf-8 Content-Length: 107253 ---response end--- 200 OK Registered socket 300 for persistent reuse. Stored cookie epaper.abc.com -1 (ANY) / <session> <insecure> [expiry none] ASP.NET_SessionId owcrje55yl45kgmhn43gq145 Length: 107253 (105K) [text/html] Saving to: `login.aspx.1' 100%[======================================================================================================================>] 107,253 24.9K/s in 4.2s 2011-08-18 08:16:05 (24.9 KB/s) - `login.aspx.1' saved [107253/107253] Saving cookies to abc_cookies.txt. Done saving cookies. C:\Temp>wget --referer=http://epaper.abc.com/login.aspx --cookies=on --load-cookies abc_cookies.txt --keep-session-cookies --save-cookies abc_cookies.txt http://epaper.abc.com/PagePrint/16_08_2011_001.pdf --debug SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc syswgetrc = C:\Program Files (x86)\GnuWin32/etc/wgetrc DEBUG output created by Wget 1.11.4 on Windows-MinGW. Stored cookie epaper.abc.com -1 (ANY) / <session> <insecure> [expiry none] ASP.NET_SessionId owcrje55yl45kgmhn43gq145 --2011-08-18 08:16:12-- http://epaper.abc.com/PagePrint/16_08_2011_001.pdf Resolving epaper.abc.com... seconds 0.00, 999.999.99.99 Caching epaper.abc.com => 999.999.99.99 Connecting to epaper.abc.com|999.999.99.99|:80... seconds 0.00, connected. Created socket 300. Releasing 0x00598290 (new refcount 1). ---request begin--- GET /PagePrint/16_08_2011_001.pdf HTTP/1.0 Referer: http://epaper.abc.com/login.aspx User-Agent: Wget/1.11.4 Accept: */* Host: epaper.abc.com Connection: Keep-Alive Cookie: ASP.NET_SessionId=owcrje55yl45kgmhn43gq145 ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 200 OK Connection: keep-alive Date: Thu, 18 Aug 2011 02:46:30 GMT Server: Microsoft-IIS/6.0 X-Powered-By: ASP.NET X-AspNet-Version: 2.0.50727 content-disposition: attachement; filename=Default_logo.gif Cache-Control: private Content-Type: image/GIF Content-Length: 4568 ---response end--- 200 OK Registered socket 300 for persistent reuse. Length: 4568 (4.5K) [image/GIF] Saving to: `16_08_2011_001.pdf' 100%[======================================================================================================================>] 4,568 7.74K/s in 0.6s 2011-08-18 08:16:14 (7.74 KB/s) - `16_08_2011_001.pdf' saved [4568/4568] Saving cookies to abc_cookies.txt. Done saving cookies. Contents of abc_cookies.txt epaper.abc.com FALSE / FALSE 0 ASP.NET_SessionId owcrje55yl45kgmhn43gq145

    Read the article

  • Variable parsing with Bash and wget

    - by Bill Westrup
    I'm attempting to use wget in a simple bash script to grab a jpeg image from an Axis camera. This script outputs a file named JPEGOUT, instead of the desired output, which should be a timestamp jpeg (ex: 201209292040.jpg) . Changing the variable in the wget statement from JPEGOUT to $JPEGOUT makes wget fail with "wget: missing URL" error. The weird thing is wget parses the $IP vairable correctly. No luck on the output file name. I've tried single quotes, double quotes, parenthesis: all to no luck. Here's the script !/bin/bash IP=$1 JPEGOUT= date +%Y%m%d%H%M.jpg wget -O JPEGOUT http://$IP/axis-cgi/jpg/image.cgi?resolution=640x480&compression=25 Any ideas on how to get the output file name to parse correctly?

    Read the article

  • Excluding certain file types in wget

    - by Alan Spark
    I have been using wget for a while now to mirror files from an ftp server to a local folder. My wget command is as follows: wget -mirror -w 1 -p -nH -P /var/www/ ftp://my-ftp-server However, I just noticed that it is copying over a .listing file for every folder that it visits. So, even if nothing has been changed on the ftp server, a .listing file will always be copied. My understanding is that the .listing file is created when wget opens the ftp session. Is there a way to avoid this? I've tried the -R option (e.g. -R .listing) but this didn't help. See: http://www.gnu.org/software/wget/manual/wget.html#Recursive-Accept_002fReject-Options Thanks, Alan

    Read the article

  • wget is working only when used with sudo

    - by Yusuf
    I'm having quite a strange behavior with wget since yesterday. I can download files by using sudo wget, but when I try the same file with only wget, I can get this error: yusufh@ubuntu-yuh:~$ wget http://www.kegel.com/wine/winetricks --2010-12-17 09:34:11-- http://www.kegel.com/wine/winetricks Resolving www.kegel.com... failed: Name or service not known. wget: unable to resolve host address `www.kegel.com' and with sudo wget: yusufh@ubuntu-yuh:~$ sudo wget http://www.kegel.com/wine/winetricks --2010-12-17 09:35:37-- http://www.kegel.com/wine/winetricks Connecting to 127.0.0.1:5865... connected. Proxy request sent, awaiting response... 200 OK Length: 190672 (186K) [text/plain] Saving to: `winetricks' 100%[==================================================================================================>] 190,672 --.-K/s in 0.03s 2010-12-17 09:35:37 (6.92 MB/s) - `winetricks' saved [190672/190672] After the comments below, here is an update: I can use Google Chrome or Firefox perfectly without running it as root. I use ntlmaps to connect to the office proxy. So I need to use 127.0.0.1:5865 as the proxy for clients. Result for env | grep -i proxy: NO_PROXY=localhost,127.0.0.0/8,*.local, http_proxy=127.0.0.1:5865 ftp_proxy=127.0.0.1:5865 all_proxy=socks://127.0.0.1:5865/ ALL_PROXY=socks://127.0.0.1:5865/ https_proxy=127.0.0.1:5865 no_proxy=localhost,127.0.0.0/8,*.local while sudo env | grep -i proxy is empty! HELP!

    Read the article

  • How to specify the wget file location (output location)

    - by Matt
    i have this batch file that reads an input file to grab images from a URL. How can I specify a different output location for the wget files. @echo off setlocal enabledelayedexpansion for /f %%l in (Input.txt) do ( set line=%%l wget -O !line:~0,10!.jpg !line:~10! ) I want the output to be in a different location. edit If wget is run from C:\wget\bin. How can the output be saved to C:\folder\folder\x

    Read the article

  • How to install wget on this?

    - by Winluser
    I did download RubyStack 2.0.3 for VMWare from http://bitnami.org/files/stacks/rubystack/2.0-3… but I cannot download anything on it! It appears that all basic utilities are missing/screwed: bitnami@linux:/var/tmp$ wget -bash: wget: command not found bitnami@linux:/var/tmp$ curl curl: error while loading shared libraries: libcurl.so.4: cannot open shared obj ect file: No such file or directory bitnami@linux:/var/tmp$ man wget -bash: man: command not found bitnami@linux:/var/tmp$ sudo apt-get install wget [sudo] password for bitnami: Reading package lists… Done Building dependency tree Reading state information… Done E: Couldn’t find package wget Any ideas how can I download anything on this machine? (I don't have physical access to it)

    Read the article

  • Downloading multiple files with wget and handling parameters

    - by coure2011
    How can I download multiple files using wget? I also want to rename the files. Here are the commands I'm running one by one (copy/paste on terminal): wget -c --load-cookies cookies.txt http://www.filesonic.com/file/812720774/PS11.rar -O part11.rar wget -c --load-cookies cookies.txt http://www.filesonic.com/file/812721094/PS12.rar -O part12.rar wget -c --load-cookies cookies.txt http://www.filesonic.com/file/812720804/PS13.rar -O part13.rar wget -c --load-cookies cookies.txt http://www.filesonic.com/file/812720854/PS14.rar -O part14.rar ........ and so on.. What can I do to download all these files one by one?

    Read the article

  • Downloading a file from the internet with '&' in URL using wget

    - by matt_tm
    Hi, I'm trying to download a file from a URL that looks like this: http://pdf.example.com/filehandle.ashx?p1=ABC&p2=DEF.pdf Within the browser, this link prompts me to download a file called x.pdf irrespective of what DEF is (but 'x.pdf' is the right content). However using wget, I get the following: >wget.exe http://pdf.example.com/filehandle.ashx?p1=ABC&p2=DEF.pdf SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc syswgetrc = C:\Program Files\GnuWin32/etc/wgetrc --2011-01-06 07:52:05-- http://pdf.example.com/filehandle.ashx?p1=ABC Resolving pdf.example.com... 99.99.99.99 Connecting to pdf.example.com|99.99.99.99|:80... connected. HTTP request sent, awaiting response... 500 Internal Server Error 2011-01-06 07:52:08 ERROR 500: Internal Server Error. 'p2' is not recognized as an internal or external command, operable program or batch file. This is on a Windows Vista system Edit1 >wget.exe "http://pdf.example.com/filehandle.ashx?p1=ABC&p2=DEF.pdf" SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc syswgetrc = C:\Program Files\GnuWin32/etc/wgetrc --2011-02-06 10:18:31-- http://pdf.example.com/filehandle.ashx?p1=ABC&p2=DEF.pdf Resolving pdf.example.com... 99.99.99.99 Connecting to pdf.example.com|99.99.99.99|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 4568 (4.5K) [image/JPEG] Saving to: `filehandle.ashx@p1=ABC&p2=DEF.pdf' 100%[======================================>] 4,568 --.-K/s in 0.1s 2011-02-06 10:18:33 (30.0 KB/s) - `filehandle.ashx@p1=ABC&p2=DEF.pdf' saved [4568/4568]

    Read the article

  • wget: Turn Off Forced .html Retreival

    - by Mike B
    When performing a recursive download, I specify a pattern via the -R parameter for wget to reject, but if this file is a HTML file, wget downloads the file regardless of whether or not it matches the pattern. e.g. wget -r -R "*dynamicfile*" example.com still retrieves files such as example.com/dynamicfile1.html Is there a way to prevent this?

    Read the article

  • wget not converting links

    - by acrosman
    I am trying to mirror a fairly large site (20,000+ pages) prior to a major overhaul. Basically, I need a backup before cutting over to the new one in case we forgot something we need (we'll have about 1,000 pages at launch). The site is run on a CMS that I cannot easily extract usable data from, so I'm trying to make the copy with wget. My problem is that wget does not appear to be actually converting links, despite the presence of --convert-links or -k in the command. I've tried a couple of different combinations of flags, but I haven't been able to get the output I need. Most recent failed attempt was: nohup wget --mirror -k -l10 -PafscSnapshot --html-extension -R *calendar* -o wget.log http://www.example.org & I've also included the --backup-converted, and --convert-links instead of -k (not that it have mattered). I've done it with and without -P and -l, again no that they should matter. Results in files that still have links like: http://www.example.org//ht/d/sp/i/17770

    Read the article

  • Wget save cookies not working

    - by TrymBeast
    I've been trying to login in the pyload through the web api, but wget is not saving the cookies and I don't understand why. I'm using the following command: wget --delete-after --keep-session-cookies --save-cookies=my_cookies.txt --post-data="username=USERNAME&password=PASSWORD" http://localhost:8000/api/login But the content of my_cookies.txt is: # HTTP cookie file. # Generated by Wget on 2012-06-23 22:31:33. # Edit at your own risk. When I run the same command but in debug mode I get the following output that includes the set cookie in the header response: DEBUG output created by Wget 1.10.2 (Red Hat modified) on linux-gnueabi. --22:31:11-- http://localhost:8000/api/login Resolving localhost... 127.0.0.1 Caching localhost => 127.0.0.1 Connecting to localhost|127.0.0.1|:8000... connected. Created socket 3. Releasing 0x000504d0 (new refcount 1). ---request begin--- POST /api/login HTTP/1.0 User-Agent: Wget/1.10.2 (Red Hat modified) Accept: */* Host: localhost:8000 Connection: Keep-Alive Content-Type: application/x-www-form-urlencoded Content-Length: 32 ---request end--- [POST data: username=USERNAME&password=PASSWORD] HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 200 OK Content-Length: 34 Content-Type: application/json Cache-Control: no-cache, must-revalidate Set-cookie: beaker.session.id=405390ddc809efed54820638c95d7997; expires=Tue, 19-Jan-2038 04:14:07 GMT; Path=/ Connection: Keep-Alive Date: Sat, 23 Jun 2012 21:31:11 GMT Server: CherryPy/3.1.2 WSGI Server ---response end--- 200 OK hs->local_file is: login (not existing) Registered socket 3 for persistent reuse. TEXTHTML is on. Length: 34 [application/json] Saving to: `login' 100%[=======================================>] 34 --.-K/s in 0s 22:31:11 (1.28 MB/s) - `login' saved [34/34] Removing file due to --delete-after in main(): Removing login. Saving cookies to my_cookies.txt. Done saving cookies. Can anyone tell me what am I doing wrong? Thanks in advance!

    Read the article

  • wget not converting links

    - by acrosman
    I am trying to mirror a fairly large site (20,000+ pages) prior to a major overhaul. Basically, I need a backup before cutting over to the new one in case we forgot something we need (we'll have about 1,000 pages at launch). The site is run on a CMS that I cannot easily extract usable data from, so I'm trying to make the copy with wget. My problem is that wget does not appear to be actually converting links, despite the presence of --convert-links or -k in the command. I've tried a couple of different combinations of flags, but I haven't been able to get the output I need. Most recent failed attempt was: nohup wget --mirror -k -l10 -PafscSnapshot --html-extension -R *calendar* -o wget.log http://www.example.org & I've also included the --backup-converted, and --convert-links instead of -k (not that it have mattered). I've done it with and without -P and -l, again no that they should matter. Results in files that still have links like: http://www.example.org/ht/d/sp/i/17770

    Read the article

  • Wget site mirror, links with rel="<content>" not followed

    - by Pacifika
    Whilst creating a site mirror using wget 1.12 on Ubuntu links with a rel attribute set are not downloaded: <a href="link" rel="tag">text</a> Rel="tag" is a microformat (By adding rel="tag" to a hyperlink, a page indicates that the destination of that hyperlink is an author-designated "tag" (or keyword/subject) for the current page). My WordPress theme uses this for link to tags, so 99% of the site is ignored. Edit: it turns out all my permalinks use rel="bookmark" and are skipped as well. I'm using the following wget command (this ignores robots.txt and also follows nofollow links): wget -mkp -e robots=off http://site How do I make wget follow links with rel set?

    Read the article

  • How can I install things in Linux with *no yum* and *no wget*?

    - by e9t
    I'm a newbie to Linux (that mainly uses Windows and Mac OS X) needing some advice. I was trying to install git on a Linux machine today, and encountered some problems: Not knowing the version of the installed OS, I've opened the /proc/version file which said: Linux version 2.6.9-42.0.2.ELsmp ([email protected]) (gcc version 3.4.6 20060404 (Red Hat 3.4.6-3)) #1 SMP Thu Aug 17 17:57:31 EDT 2006 Then, as written in the git documents (http://git-scm.com/download/linux), I assumed I could use the yum install git command for Fedora, but got the following result. [root@myserver ~]# yum install git -bash: yum: command not found So I tried installing yum using wget, but wasn't so lucky. [root@myserver ~]# wget http://linux.duke.edu/projects/yum/download/2.0/yum-2.0.7.tar.gz -bash: wget: command not found I googled and found this page and this page, so tried installing yum with rpm, but only got a result full of question marks. (Possibly an encoding problem, hmm...) [root@myserver ~]# rpm -Uvh http://www.eomy.net/linux/install-yum-x86_64/wget-1.10.2-0.40E.x86_64.rpm http://www.eomy.net/linux/install-yum-x86_64/wget-1.10.2-0.40E.x86_64.rpm(??)?? ?????? ?: /var/tmp/rpm-xfer.TbuAOu: V3 DSA signature: NOKEY, key ID 443e1821 ???.. ########################################### [100%] wget-1.10.2-0.40E U???????g??????? wget-1.10.2-0.40E???? ??g??/usr/bin/wget ?? wget-1.10.2-0.40E U?????? ???? wget-1.10.2-0.40E???? ??g??/usr/share/man/man1/wget.1.gz ?? wget-1.10.2-0.40E U?????? ???? [root@myserver ~]# Finally, when I typed rpm --version in the terminal, I got the below results. [root@myserver ~]# rpm --version RPM ???? - 4.3.3 I would like to know what I can do or possibly try now. Is it not possible to wget or yum anything in my situation? Or is there any magical tool like homebrew (http://mxcl.github.com/homebrew/) that I can use? Any comments or advice would be appreciated. Thanks in advance!

    Read the article

  • WGet or cURL: Mirror Site from http://site.com And No Internal Access

    - by alharaka
    I have tried wget -m wget -r and a whole bunch of variations. I am getting some of the images on http://site.com, one of the scripts, and none of the CSS, even with the fscking -p parameter. The only HTML page is index.html and there are several more referenced, so I am at a loss. curlmirror.pl on the cURL developers website does not seem to get the job done either. Is there something I am missing? I have tried different levels of recursion with only this URL, but I get the feeling I am missing something. Long story short, some school allows its students to submit web projects, but they want to know how they can collect everything for the instructor who will grade it, instead of him going to all the externally hsoted sites. UPDATE: I think I figured out the issue. I though the links to the other pages were in the index.html page that downloaded. I was way off. Turns out the footer of the page, which has all the navigation links, is handled by a JavaScript file Include.js, which reads JLSSiteMap.js and some other JS files to do page navigation and the like. As a result, wget does not pick up an other dependencies because a lot of this crap is handled not on web pages. How can I handle such a website? This is one of several problem cases. I assume little can be done if wget cannot parse JavaScript.

    Read the article

  • wget hangs in http request sent awaiting response in some sites

    - by gkr
    Using Ubuntu 12.04. wget hangs in http request sent, awaiting response... in some sites. Browser's are not opening sites that are failed in wget. But in WinXP everything works. This works gkr@gkr-desktop:~/Documents/curl$ wget google.com --2012-06-12 21:29:37-- http://google.com/ Resolving google.com (google.com)... 74.125.236.174, 74.125.236.160, 74.125.236.161, ... Connecting to google.com (google.com)|74.125.236.174|:80... connected. HTTP request sent, awaiting response... 301 Moved Permanently Location: http://www.google.com/ [following] --2012-06-12 21:29:38-- http://www.google.com/ Resolving www.google.com (www.google.com)... 74.125.236.179, 74.125.236.180, 74.125.236.176, ... Connecting to www.google.com (www.google.com)|74.125.236.179|:80... connected. HTTP request sent, awaiting response... 302 Found Location: http://www.google.co.in/ [following] --2012-06-12 21:29:38-- http://www.google.co.in/ Resolving www.google.co.in (www.google.co.in)... 74.125.236.184, 74.125.236.191, 74.125.236.183, ... Connecting to www.google.co.in (www.google.co.in)|74.125.236.184|:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: `index.html.3' [ ] 13,383 --.-K/s in 0.04s 2012-06-12 21:29:39 (308 KB/s) - `index.html.3' saved [13383] gkr@gkr-desktop:~/Documents/curl$ This site just stops/hangs in awaiting response. gkr@gkr-desktop:~/Documents/curl$ wget grooveshark.com --2012-06-12 21:27:29-- http://grooveshark.com/ Resolving grooveshark.com (grooveshark.com)... 8.20.213.76 Connecting to grooveshark.com (grooveshark.com)|8.20.213.76|:80... connected. HTTP request sent, awaiting response... ^C gkr@gkr-desktop:~/Documents/curl$ Thanks

    Read the article

  • wget recursively download from pages with lots of links

    - by Shadow
    When using wget with the recursive option turned on I am getting an error message when it is trying to download a file. It thinks the link is a downloadable file when in reality it should just be following it to get to the page that actually contains the files(or more links to follow) that I want. wget -r -l 16 --accept=jpg website.com The error message is: .... since it should be rejected. This usually occurs when the website link it is trying to fetch ends with a sql statement. The problem however doesn't occur when using the very same wget command on that link. I want to know how exactly it is trying to fetch the pages. I guess I could always take a poke around the source although I don't know how messy the project is. I might also be missing exactly what "recursive" means in the context of wget. I thought it would run through and travel in each link getting the files with the extension I have requested. I posted this up over at stackOverFlow but they turned me over here:) Hoping you guys can help.

    Read the article

  • wget crawling search results of news website

    - by kiltek
    I am trying to crawl the search results of a news website using wget. The name of the website is www.voanews.com. After typing in my search keyword and clicking search, it proceeds to the results. Then i can specify a "to" and a "from"-date and hit search again. After this the URL becomes: http://www.voanews.com/search/?st=article&k=mykeyword&df=10%2F01%2F2013&dt=09%2F20%2F2013&ob=dt#article and the actual content of the results is what i want to download. To achieve this I created the following wget-command: wget --reject=js,txt,gif,jpeg,jpg \ --accept=html \ --user-agent=My-Browser \ --recursive --level=2 \ www.voanews.com/search/?st=article&k=germany&df=08%2F21%2F2013&dt=09%2F20%2F2013&ob=dt#article Unfortunately, the crawler doesn't download the search results. It only gets into the upper link bar, which contains the "Home,USA,Africa,Asia,..." links and saves the articles they link to. It seems like he crawler doesn't check the search result links at all. What am I doing wrong and how can I modify the wget command to download the results search list links (and of course the sites they link to) only ?

    Read the article

  • How to start using Wget?

    - by brilliant
    Please, forgive me for asking this question. Usually I would try to learn thisngs myself first before bothering others, but my situation is urgent - if I don't act now and don't download all my family pictures from this website, it will be closed in about two weeks from now and I will loose all of them. So, please help me. I have already asked a question about how to download from my website all my pictures automatically here and I was advised to use Wget. But I don't know the very basics as to how to use it. The basics are not explained on that page. In that questioned that I asked I was even given this line: wget -r -A .jpg,.gif,.png but I don't know what I am supposed to do with it in order to get the Wget going and download all the pictures from my website. Please, help me somebody!

    Read the article

  • How to allow wget to overwrite files

    - by Gnanam
    Using wget command, how do I allow/instruct to overwrite my local file everytime, irrespective of how many times I invoke. Let's say, I want to download a file from the location: http://server/folder/file1.html Here, whenever I say wget http://server/folder/file1.html, I want this file1.html to be overwritten in my local system irrespective of the time it is changed, already downloaded, etc. My intention/use case here is that when I call wget, I'm very sure that I want to replace/overwrite the existing file. I've tried out the following options, but each option is intended/meant for some other purpose. -nc = --no-clobber -N = Turn on time-stamping -r = Turn on recursive retrieving

    Read the article

  • Wget - if / else download condition?

    - by Kai
    I want wget to prefer a certain filetype over another, if the files have the same basename. For example: if foo.ogg available, don't download foo.mp3 the way i use wget so far to crawl/automatically download (if anyone is interested): wget -Dfoo.com -I /folder/ -r -l 1 -nc -A.ogg,.mp3 -i http://www.foo.com/folder/ but this, of course, gets me .mp3 AND .ogg files. It often also gets me image files like .png which i didn't want in the first place, and discards them afterwards. Any Ideas? (Syntax-Explanation: -D: download only from this Domain -I: download only from this subfolder of Domain -r: recursive (follow links and directory structure) -l 1: follow only 1 link deep -nc: no clobber = download only if file doesn't exist -A: accept/download only all *.ogg and *.mp3 (discard necessary html-files) -i: download-url/starting point)

    Read the article

  • How to use wget with an input file and filenames

    - by Matt
    i have a text file that contains 10,000 url's with a unique number i want to save the file as. Each line has a 10 character code, then the URL of the image to retrieve. How can I make the input file use the first 10 characters as the wget filename? this is an example of the input file: input.txt x100083590http://image.allmusic.com/13/adg/cov200/drt200/t291/t29123q8m19.jpg b200149548http://ecx.images-amazon.com/images/I/41DoH%2BAWKEL.jpg z100151855http://image.allmusic.com/13/amg/cov200/dri400/i450/i45035hxdrb.jpg p400171646http://ecx.images-amazon.com/images/I/61cH4n34IhL.jpg wget -i input.txt would get the file but not with the preceding unique number. I want t29123q8m19.jpg (the first line) to be saved as x100083590.jpg If there is a better way to write out the input file, say with the URL first, then I can do that too, but I will never know the length of the first field. Right now the first 10 characters will always be what I want to save the wget image as. Edit This is being done in a windows environment.

    Read the article

1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >