Search Results

Search found 19 results on 1 pages for 'httrack'.

Page 1/1 | 1 

  • How to continue an HTTrack mirroring session from the command line?

    - by isme
    I want to drive my mirroring project using the Command Prompt instead of the WinHTTrack interface so that I can script and schedule the mirroring session more easily. The output of httrack --help gives a simple command for continuing an interrupted mirroring session: example: httrack --continue continues a mirror in the current folder When I try httrack --continue in my HTTrack project folder, all I get is output like this: Example: -%F "<!-- Mirrored from %s by HTTrack Website Copier/3.x [XR&CO'2010], %s -->" * Option %F needs to be followed by a blank space, and a footer string With each parameter on a new line for readability, the first line of my doit.log file looks like this: -qiC1%P0s0b0u1j0%s%u0N0%I0p1DaK0c1T30H0%kf2E1800A25000%c0.1%f#f -F "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" -%F "" -%l "en, en, *" http://saa.gov.uk/search.php?SEARCHED=1&SEARCH_TABLE=council_tax&SEARCH_TERM=City+of+Edinburgh&DISPLAY_COUNT=100 -O1 "C:\\Users\\Iain\\Projects\\Council Tax Analysis\\Code\\HTTrack\\Council Tax Valuation List" -* \ +*search.php?SEARCHED=1* -*DISPLAY_MODE=FULL* The parameter %F "" should tell HTTrack to use an empty footer. I used the WinHTTrack interface to create the project and start the mirroring session. I can interrupt and continue the mirroring session using the interface. The HTML files saved by WinHTTrack have no footer.

    Read the article

  • Mirror a website with httrack while executing javascript

    - by Martin
    I want do save a mirror of www.youtube.com/tv. I obviously do not want to save the videos. I want the code running the website in a local copy, everything else can stay remote. The code I want is mainly contained in 2 files: live.js and app-prod.js. I tried using httrack. I have issue parsing the javascript to load anything past the first file: live.js. The %P parameter does not help. httrack www.youtube.com/tv +* -r6 --mirror -%P -j It doesn't go further than live.js because some javascript needs to be executed to load the next file. I know I can do this manually with any browser. I want to automate the process. Is httrack able to do this by itself? If yes, how?

    Read the article

  • Download a website that requires log-in with HTTtrack Copier

    - by H.Moss
    Hi guys! I have been researching of how to download content of a site that requires username and password. This is actually harder than I thought it would be. I tried to use both HTTtrack Copier and followed the instruction below, but it's not working! Q: I can not access several pages (access forbidden, or redirect to another location), but I can with my browser, what's going on? A: You may need cookies! Cookies are specific data (for example, your username or password) that are sent to your browser once you have logged in certain sites so that you only have to log-in once. For example, after having entered your username in a website, you can view pages and articles, and the next time you will go to this site, you will not have to re-enter your username/password. To "merge" your personnal cookies to an HTTrack project, just copy the cookies.txt file from your Netscape folder (or the cookies located into the Temporary Internet Files folder for IE) into your project folder (or even the HTTrack folder)

    Read the article

  • How to get httrack to work with SSL on mac os x? (libssl.so not found)

    - by cwd
    I'm trying to use httrack website copier but the program is running and reporting "no-ssl" (ie: it does not have the capability to copy secure sites). From looking over this thread, it seems that the problem is either when I make & configure the program, or when I run the program, it is not finding the lib-ssl / open-ssl that I have installed. I think it is looking for /var/root/lib/libssl.so.1.0 The user on that forum states that he created a symlink which allowed httrack to find the ssl library in the non-default location. Perhaps that's what I need to do - but where do I create the link from and to? I'm not seeing that I have any libssl.so files installed on my system. Do I need the development package? If so, how do I install that? I used macports to install the current version of openssl that I have. I'm running OS X 10.6. Reserch I have run this command to try and debug: dtruss httrack 2&1 | grep ssl and that outputs this: stat64("libssl.so.1.0\0", 0x7FFF5FBFEE30, 0x7FFF5FBFF470) = -1 Err#2 stat64("/var/root/lib/libssl.so.1.0\0", 0x7FFF5FBFF210, 0x7FFF5FBFF470) = -1 Err#2 stat64("/usr/local/lib/libssl.so.1.0\0", 0x7FFF5FBFF210, 0x7FFF5FBFF470) = -1 Err#2 stat64("/usr/lib/libssl.so.1.0\0", 0x7FFF5FBFF220, 0x7FFF5FBFF470) = -1 Err#2 stat64("libssl.so.1\0", 0x7FFF5FBFEE30, 0x7FFF5FBFF470) = -1 Err#2 stat64("/var/root/lib/libssl.so.1\0", 0x7FFF5FBFF220, 0x7FFF5FBFF470) = -1 Err#2 stat64("/usr/local/lib/libssl.so.1\0", 0x7FFF5FBFF220, 0x7FFF5FBFF470) = -1 Err#2 stat64("/usr/lib/libssl.so.1\0", 0x7FFF5FBFF220, 0x7FFF5FBFF470) = -1 Err#2 stat64("libssl.so.1.0.0\0", 0x7FFF5FBFEE30, 0x7FFF5FBFF470) = -1 Err#2 stat64("/var/root/lib/libssl.so.1.0.0\0", 0x7FFF5FBFF210, 0x7FFF5FBFF470) = -1 Err#2 stat64("/usr/local/lib/libssl.so.1.0.0\0", 0x7FFF5FBFF210, 0x7FFF5FBFF470) = -1 Err#2 stat64("/usr/lib/libssl.so.1.0.0\0", 0x7FFF5FBFF220, 0x7FFF5FBFF470) = -1 Err#2 stat64("libssl.so.0\0", 0x7FFF5FBFEE30, 0x7FFF5FBFF470) = -1 Err#2 stat64("/var/root/lib/libssl.so.0\0", 0x7FFF5FBFF220, 0x7FFF5FBFF470) = -1 Err#2 stat64("/usr/local/lib/libssl.so.0\0", 0x7FFF5FBFF220, 0x7FFF5FBFF470) = -1 Err#2 stat64("/usr/lib/libssl.so.0\0", 0x7FFF5FBFF220, 0x7FFF5FBFF470) = -1 Err#2 stat64("libssl.so.0.9\0", 0x7FFF5FBFEE30, 0x7FFF5FBFF470) = -1 Err#2 stat64("/var/root/lib/libssl.so.0.9\0", 0x7FFF5FBFF210, 0x7FFF5FBFF470) = -1 Err#2 stat64("/usr/local/lib/libssl.so.0.9\0", 0x7FFF5FBFF210, 0x7FFF5FBFF470) = -1 Err#2 stat64("/usr/lib/libssl.so.0.9\0", 0x7FFF5FBFF220, 0x7FFF5FBFF470) = -1 Err#2 stat64("libssl.so.0.9.8p\0", 0x7FFF5FBFEE30, 0x7FFF5FBFF470) = -1 Err#2 stat64("/var/root/lib/libssl.so.0.9.8p\0", 0x7FFF5FBFF210, 0x7FFF5FBFF470) = -1 Err#2 stat64("/usr/local/lib/libssl.so.0.9.8p\0", 0x7FFF5FBFF210, 0x7FFF5FBFF470) = -1 Err#2 stat64("/usr/lib/libssl.so.0.9.8p\0", 0x7FFF5FBFF220, 0x7FFF5FBFF470) = -1 Err#2 stat64("libssl.so.0.9.8o\0", 0x7FFF5FBFEE30, 0x7FFF5FBFF470) = -1 Err#2 stat64("/var/root/lib/libssl.so.0.9.8o\0", 0x7FFF5FBFF210, 0x7FFF5FBFF470) = -1 Err#2 stat64("/usr/local/lib/libssl.so.0.9.8o\0", 0x7FFF5FBFF210, 0x7FFF5FBFF470) = -1 Err#2 stat64("/usr/lib/libssl.so.0.9.8o\0", 0x7FFF5FBFF220, 0x7FFF5FBFF470) = -1 Err#2 stat64("libssl.so.0.9.8n\0", 0x7FFF5FBFEE30, 0x7FFF5FBFF470) = -1 Err#2 stat64("/var/root/lib/libssl.so.0.9.8n\0", 0x7FFF5FBFF210, 0x7FFF5FBFF470) = -1 Err#2 stat64("/usr/local/lib/libssl.so.0.9.8n\0", 0x7FFF5FBFF210, 0x7FFF5FBFF470) = -1 Err#2 stat64("/usr/lib/libssl.so.0.9.8n\0", 0x7FFF5FBFF220, 0x7FFF5FBFF470) = -1 Err#2 stat64("libssl.so.0.9.8m\0", 0x7FFF5FBFEE30, 0x7FFF5FBFF470) = -1 Err#2 stat64("/var/root/lib/libssl.so.0.9.8m\0", 0x7FFF5FBFF210, 0x7FFF5FBFF470) = -1 Err#2 stat64("/usr/local/lib/libssl.so.0.9.8m\0", 0x7FFF5FBFF210, 0x7FFF5FBFF470) = -1 Err#2 stat64("/usr/lib/libssl.so.0.9.8m\0", 0x7FFF5FBFF220, 0x7FFF5FBFF470) = -1 Err#2 stat64("libssl.so.0.9.8l\0", 0x7FFF5FBFEE30, 0x7FFF5FBFF470) = -1 Err#2 stat64("/var/root/lib/libssl.so.0.9.8l\0", 0x7FFF5FBFF210, 0x7FFF5FBFF470) = -1 Err#2 stat64("/usr/local/lib/libssl.so.0.9.8l\0", 0x7FFF5FBFF210, 0x7FFF5FBFF470) = -1 Err#2 stat64("/usr/lib/libssl.so.0.9.8l\0", 0x7FFF5FBFF220, 0x7FFF5FBFF470) = -1 Err#2 stat64("libssl.so.0.9.8k\0", 0x7FFF5FBFEE30, 0x7FFF5FBFF470) = -1 Err#2 stat64("/var/root/lib/libssl.so.0.9.8k\0", 0x7FFF5FBFF210, 0x7FFF5FBFF470) = -1 Err#2 stat64("/usr/local/lib/libssl.so.0.9.8k\0", 0x7FFF5FBFF210, 0x7FFF5FBFF470) = -1 Err#2 stat64("/usr/lib/libssl.so.0.9.8k\0", 0x7FFF5FBFF220, 0x7FFF5FBFF470) = -1 Err#2 stat64("libssl.so.0.9.8j\0", 0x7FFF5FBFEE30, 0x7FFF5FBFF470) = -1 Err#2 stat64("/var/root/lib/libssl.so.0.9.8j\0", 0x7FFF5FBFF210, 0x7FFF5FBFF470) = -1 Err#2 stat64("/usr/local/lib/libssl.so.0.9.8j\0", 0x7FFF5FBFF210, 0x7FFF5FBFF470) = -1 Err#2 stat64("/usr/lib/libssl.so.0.9.8j\0", 0x7FFF5FBFF220, 0x7FFF5FBFF470) = -1 Err#2 stat64("libssl.so.0.9.8g\0", 0x7FFF5FBFEE30, 0x7FFF5FBFF470) = -1 Err#2 stat64("/var/root/lib/libssl.so.0.9.8g\0", 0x7FFF5FBFF210, 0x7FFF5FBFF470) = -1 Err#2 stat64("/usr/local/lib/libssl.so.0.9.8g\0", 0x7FFF5FBFF210, 0x7FFF5FBFF470) = -1 Err#2 stat64("/usr/lib/libssl.so.0.9.8g\0", 0x7FFF5FBFF220, 0x7FFF5FBFF470) = -1 Err#2 stat64("libssl.so.0.9.8b\0", 0x7FFF5FBFEE30, 0x7FFF5FBFF470) = -1 Err#2 stat64("/var/root/lib/libssl.so.0.9.8b\0", 0x7FFF5FBFF210, 0x7FFF5FBFF470) = -1 Err#2 stat64("/usr/local/lib/libssl.so.0.9.8b\0", 0x7FFF5FBFF210, 0x7FFF5FBFF470) = -1 Err#2 stat64("/usr/lib/libssl.so.0.9.8b\0", 0x7FFF5FBFF220, 0x7FFF5FBFF470) = -1 Err#2 stat64("libssl.so.0.9.8\0", 0x7FFF5FBFEE30, 0x7FFF5FBFF470) = -1 Err#2 stat64("/var/root/lib/libssl.so.0.9.8\0", 0x7FFF5FBFF210, 0x7FFF5FBFF470) = -1 Err#2 stat64("/usr/local/lib/libssl.so.0.9.8\0", 0x7FFF5FBFF210, 0x7FFF5FBFF470) = -1 Err#2 stat64("/usr/lib/libssl.so.0.9.8\0", 0x7FFF5FBFF220, 0x7FFF5FBFF470) = -1 Err#2 stat64("libssl.so.0.9.7\0", 0x7FFF5FBFEE30, 0x7FFF5FBFF470) = -1 Err#2 stat64("/var/root/lib/libssl.so.0.9.7\0", 0x7FFF5FBFF210, 0x7FFF5FBFF470) = -1 Err#2 stat64("/usr/local/lib/libssl.so.0.9.7\0", 0x7FFF5FBFF210, 0x7FFF5FBFF470) = -1 Err#2 stat64("/usr/lib/libssl.so.0.9.7\0", 0x7FFF5FBFF220, 0x7FFF5FBFF470) = -1 Err#2 stat64("libssl.so.0.9.6\0", 0x7FFF5FBFEE30, 0x7FFF5FBFF470) = -1 Err#2 stat64("/var/root/lib/libssl.so.0.9.6\0", 0x7FFF5FBFF210, 0x7FFF5FBFF470) = -1 Err#2 stat64("/usr/local/lib/libssl.so.0.9.6\0", 0x7FFF5FBFF210, 0x7FFF5FBFF470) = -1 Err#2 stat64("/usr/lib/libssl.so.0.9.6\0", 0x7FFF5FBFF220, 0x7FFF5FBFF470) = -1 Err#2 stat64("libssl.so\0", 0x7FFF5FBFEE30, 0x7FFF5FBFF470) = -1 Err#2 stat64("/var/root/lib/libssl.so\0", 0x7FFF5FBFF220, 0x7FFF5FBFF470) = -1 Err#2 stat64("/usr/local/lib/libssl.so\0", 0x7FFF5FBFF220, 0x7FFF5FBFF470) = -1 Err#2 stat64("/usr/lib/libssl.so\0", 0x7FFF5FBFF220, 0x7FFF5FBFF470) = -1 Err#2 I have already used mac ports to install open-ssl: port installed The following ports are currently installed: beecrypt @4.2.1_2 (active) cpio @2.10_0 (active) expat @2.0.1_1 (active) flex @2.5.35_0 (active) gettext @0.18.1.1_2 (active) gperf @3.0.4_0 (active) icu @4.6_0 (active) libiconv @1.13.1_0 (active) mysql5 @5.1.53_0 (active) ncurses @5.9_0 (active) ncursesw @5.8_0 (active) neon @0.29.5_0 (active) openssl @1.0.0c_0 (active) perl5.8 @5.8.9_3 (active) popt @1.16_0 (active) python24 @2.4.6_7 (active) readline @6.1.002_0 (active) rpm @4.4.9_10 (active) sqlite3 @3.7.3_0 (active) zlib @1.2.5_0 (active) Here are the install locations: locate libssl /opt/local/lib/libssl.1.0.0.dylib /opt/local/lib/libssl.a /opt/local/lib/libssl.dylib /opt/local/lib/pkgconfig/libssl.pc /opt/local/var/macports/software/openssl/1.0.0c_0/opt/local/lib/libssl.1.0.0.dylib /opt/local/var/macports/software/openssl/1.0.0c_0/opt/local/lib/libssl.a /opt/local/var/macports/software/openssl/1.0.0c_0/opt/local/lib/libssl.dylib /opt/local/var/macports/software/openssl/1.0.0c_0/opt/local/lib/pkgconfig/libssl.pc /usr/lib/libssl.0.9.7.dylib /usr/lib/libssl.0.9.8.dylib /usr/lib/libssl.0.9.dylib /usr/lib/libssl.dylib /usr/lib/pkgconfig/libssl.pc What should I do next? More Info I tried the solution below: $ DYLD_INSERT_LIBRARIES="/opt/local/lib/libssl.1.0.0.dylib" httrack Welcome to HTTrack Website Copier (Offline Browser) 3.44-1-nossl Copyright (C) Xavier Roche and other contributors To see the option list, enter a blank line or try httrack --help It is still not able to load the ssl lib: 3.44-1-nossl

    Read the article

  • Taking web sites offline for demonstration

    While working in software development in general, and in web development for a couple of customers it is quite common that it is necessary to provide a test bed where the client is able to get an image, or better said, a feeling for the visions and ideas you are talking about. Usually here at IOS Indian Ocean Software Ltd. we set up a demo web site on one of our staging servers, and provide credentials to the customer to access and review our progress and work ad hoc. This gives us the highest flexibility on both sides, as the test bed is simply online and available 24/7. We can update the structure, the UI and data at any time, and the client is able to view it as it suits best for her/him. Limited or lack of online connectivity But what is going to happen when your client is not capable to be online - no matter for what reasons; here are some more obvious ones: No internet connection (permanently or temporarily) Expensive connection, ie. mobile data package, stay at a hotel, etc. Presentation devices at an exhibition, ie. using tablets or iPads Being abroad for a certain time, and only occasionally online No network coverage, especially on mobile Bad infrastructure, like ie. in Third World countries Providing a catalogue on CD or USB pen drive Anyway, it doesn't matter really. We should be able to provide a solution for the circumstances of our customers. Presentation during an exhibition Recently, we had the following request from a customer: Is it possible to let us have a desktop version of ResortWork.co.uk that we can use for demo purposes at the forthcoming Ski Shows? It would allow us to let stand visitors browse the sites on an iPad to view jobs and training directory course listings. Yes, sure we can do that. Eventually, you might think why don't they simply use 3G enabled iPads for that purpose? As stated above, there might be several reasons for that - low coverage, expensive data packages, etc. Anyway, it is not a question on how to circumvent the request but to deliver a solution to that. Possible solutions... or not? We already did offline websites earlier, and even established complete mirrors of one or two web sites on our systems. There are actually several possibilities to handle this kind of request, and it mainly depends on the system or device where the offline site should be available on. Here, it is clearly expressed that we have to address this on an Apple iPad, well actually, I think that they'd like to use multiple devices during their exhibitions. Following is an overview of possible solutions depending on the technology or device in use, and how it can be done: Replication of source files and database The above mentioned web site is running on ASP.NET, IIS and SQL Server. In case that a laptop or slate runs a Windows OS, the easiest way would be to take a snapshot of the source files and database, and transfer them as local installation to those Windows machines. This approach would be fully operational on the local machine. Saving pages for offline usage This is actually a quite tedious job but still practicable for small web sites Tool based approach to 'harvest' the web site There quite some tools in the wild that could handle this job, namely wget, httrack, web copier, etc. Screenshots bundled as PDF document Not really... ;-) Creating screencast or video Simply navigate through your website and record your desktop session. Actually, we are using this kind of approach to track down difficult problems in order to see and understand exactly what the user was doing to cause an error. Of course, this list isn't complete and I'd love to get more of your ideas in the comments section below the article. Preparations for offline browsing The original website is dynamically and data-driven by ASP.NET, and looks like this: As we have to put the result onto iPads we are going to choose the tool-based approach to 'download' the whole web site for offline usage. Again, depending on the complexity of your web site you might have to check which of the applications produces the best results for you. My usual choice is to use wget but in this case, we run into problems related to the rewriting of hyperlinks. As a consequence of that we opted for using HTTrack. HTTrack comes in different flavours, like console application but also as either GUI (WinHTTrack on Windows) or Web client (WebHTTrack on Linux/Unix/BSD). Here's a brief description taken from the original website about HTTrack: HTTrack is a free (GPL, libre/free software) and easy-to-use offline browser utility. It allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site's relative link-structure. Simply open a page of the "mirrored" website in your browser, and you can browse the site from link to link, as if you were viewing it online. And there is an extensive documentation for all options and switches online. General recommendation is to go through the HTTrack Users Guide By Fred Cohen. It covers all the initial steps you need to get up and running. Be aware that it will take quite some time to get all the necessary resources down to your machine. Actually, for our customer we run the tool directly on their web server to avoid unnecessary traffic and bandwidth. After a couple of runs and some additional fine-tuning - explicit inclusion or exclusion of various external linked web sites - we finally had a more or less complete offline version available. A very handsome feature of HTTrack is the error/warning log after completing the download. It contains some detailed information about errors that appeared on the pages and the links within the pages that have been processed. Error: "Bad Request" (400) at link www.resortwork.co.uk/job-details_Ski_hire:tech_or_mgr_or_driver_37854.aspx (from www.resortwork.co.uk/Jobs_A_to_Z.aspx)Error: "Not Found" (404) at link www.247recruit.net/images/applynow.png (from www.247recruit.net/css/global.css)Error: "Not Found" (404) at link www.247recruit.net/activate.html (from www.247recruit.net/247recruit_tefl_jobs_network.html) In our situation, we took the records of HTTP 400/404 errors and passed them to the web development department. Improvements are to be expected soon. ;-) Quality assurance on the full-featured desktop Unfortunately, the generated output of HTTrack was still incomplete but luckily there were only images missing. Being directly on the web server we simply copied the missing images from the original source folder into our offline version. After that, we created an archive and transferred the file securely to our local workspace for further review and checks. From that point on, it wasn't necessary to get any more files from the original web server, and we could focus ourselves completely on the process of browsing and navigating through the offline version to isolate visual differences and functional problems. As said, the original web site runs on ASP.NET Web Forms and uses Postback calls for interaction like search, pagination and partly for navigation. This is the main field of improving the offline experience. Of course, same as for standard web development it is advised to test with various browsers, and strangely we discovered that the offline version looked pretty good on Firefox, Chrome and Safari, but not in Internet Explorer. A quick look at the HTML source shed some light on this, and there are conditional CSS inclusions based on the user agent. HTTrack is not acting as Internet Explorer and so we didn't have the necessary overrides for this browser. Not problematic after all in our case, but you might have to pay attention to this and get the IE-specific files explicitly. And while having a view at the source code, we also found out that HTTrack actually modifies the generated HTML output. In several occasions we discovered that <div> elements were converted into <table> constructs for no obvious reason; even nested structures. Search 'e'nd destroy - sed (or Notepad++) to the rescue During our intensive root cause analysis for a couple of HTML/CSS problems that needed some extra attention it is very helpful to be familiar with any editor that allows search and replace over multiple files like, ie. sed - stream editor for filtering and transforming text on Linux or my personal favourite Notepad++ on Windows. This allowed us to quickly fix a lot of anchors with onclick attributes and Javascript code that was addressed to ASP.NET files instead of their generated HTML counterparts, like so: grep -lr -e '.aspx' * | xargs sed -e 's/.aspx/.html?/g' The additional question mark after the HTML extension helps to separate the query string from the actual target and solved all our missing hyperlinks very fast. The same can be done in Notepad++ on Windows, too. Just use the 'Replace in files' feature and you are settled. Especially, in combination with Regular Expressions (regex). Landscape of browsers Okay, after several runs of HTML/CSS code analysis, searching and replacing some strings in a pool of more than 4.000 files, we finally had a very good match of an offline browsing experience in Firefox and Chrome on Linux. Next, we transferred that modified set of files to a Windows 8 machine for review on Firefox, Chrome and Internet Explorer 7 to 10, and a Mac mini running Mac OS X 10.7 to check the output on Safari and again on Chrome. Besides IE, for reasons already mentioned above, the results were identical. And last but not least it was about to check web site on tablets. Please continue to read on the following articles: Taking web sites offline for demonstration on Galaxy Tablet Taking web sites offline for demonstration on iPad

    Read the article

  • Best Web Site Copying Software

    - by GregH
    I just wanted to get some opinions on the best "web site copying" software out there (free or commercial is fine). I have a site that I've recently become responsible for managing, and the previous consultant has not provided operating system access. As such, the plan is to re-host the web site. I realize there are a lot of different issues to consider in doing this. However, I don't have much choice in the matter now. The plan is to use web site copying software (ala HTTrack) to "rip" the web site, and then modify what is downloaded back in to a maintainable site. This, of course, involves HTML, css, javascript, etc on the front-end. I'd like to recover as much of the site as possible to make re-creating it as easy as possible. Your input is appreciated. Input on my approach is also appreciated. Thanks!

    Read the article

  • Best Web Site Copying Software

    - by GregH
    I just wanted to get some opinions on the best "web site copying" software out there (free or commercial is fine). I have a site that I've recently become responsible for managing, and the previous consultant has not provided operating system access. As such, the plan is to re-host the web site. I realize there are a lot of different issues to consider in doing this. However, I don't have much choice in the matter now. The plan is to use web site copying software (ala HTTrack) to "rip" the web site, and then modify what is downloaded back in to a maintainable site. This, of course, involves HTML, css, javascript, etc on the front-end. I'd like to recover as much of the site as possible to make re-creating it as easy as possible.

    Read the article

  • Eclipse JDT API documentation, preferrably CHM format

    - by MasterPeter
    I am going to need to do some development with Eclipse JDT while on holiday. I will be off-line for the most of the time and therefore I will not be able to access the Eclipse JDT API documentation on the Web. I can't find a link to download the documentation in ZIP or PDF or, best of all, CHM format. Does anyone have a copy? Could you please provide a download link to it? Do you think I should use HTTRack to simply mirror the above-mentioned website? I would really rather not. CHM is rather convenient for exploring any kind of API documentation. Anyone differs in opinion?

    Read the article

  • recursive wget with hotlinked requisites

    - by dongle
    I often use wget to mirror very large websites. Sites that contain hotlinked content (be it images, video, css, js) pose a problem, as I seem unable to specify that I would like wget to grab page requisites that are on other hosts, without having the crawl also follow hyperlinks to other hosts. For example, let's look at this page https://dl.dropbox.com/u/11471672/wget-all-the-things.html Let's pretend that this is a large site that I would like to completely mirror, including all page requisites – including those that are hotlinked. wget -e robots=off -r -l inf -pk ^^ gets everything but the hotlinked image wget -e robots=off -r -l inf -pk -H ^^ gets everything, including hotlinked image, but goes wildly out of control, proceeding to download the entire web wget -e robots=off -r -l inf -pk -H --ignore-tags=a ^^ gets the first page, including both hotlinked and local image, does not follow the hyperlink to the site outside of scope, but obviously also does not follow the hyperlink to the next page of the site. I know that there are various other tools and methods of accomplishing this (HTTrack and Heritrix allow for the user to make a distinction between hotlinked content on other hosts vs hyperlinks to other hosts) but I'd like to see if this is possible with wget. Ideally this would not be done in post-processing, as I would like the external content, requests, and headers to be included in the WARC file I'm outputting.

    Read the article

  • .htaccess template, suggestions needed

    - by purpler
    # Defaults AddDefaultCharset UTF-8 DefaultLanguage en-US FileETag None Header unset ETag ServerSignature Off SetEnv TZ Europe/Belgrade # Rewrites Options +FollowSymLinks RewriteEngine On RewriteBase / # Redirect to WWW RewriteCond %{HTTP_HOST} ^serpentineseo.com RewriteRule (.*) http://www.serpentineseo.com/$1 [R=301,L] # Redirect index to root RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index\.html\ HTTP/ RewriteRule ^(.*)index\.html$ /$1 [R=301,L] # Cache media files: ExpiresActive On ExpiresDefault A0 # Month <filesMatch "\.(gif|jpg|jpeg|png|ico|swf|js)$"> Header set Cache-Control "max-age=2592000, public" </filesMatch> # Week <FilesMatch "\.(css|pdf)$"> Header set Cache-Control "max-age=604800" </FilesMatch> # 10 Min <FilesMatch "\.(html|htm|txt)$"> Header set Cache-Control "max-age=600" </FilesMatch> # Do not cache <FilesMatch "\.(pl|php|cgi|spl|scgi|fcgi)$"> Header unset Cache-Control </FilesMatch> # Compress output <IfModule mod_deflate.c> <FilesMatch "\.(html|js|css)$"> SetOutputFilter DEFLATE </FilesMatch> </IfModule> # Error Documents ErrorDocument 206 /error/206.html ErrorDocument 401 /error/401.html ErrorDocument 403 /error/403.html ErrorDocument 404 /error/404.html ErrorDocument 500 /error/500.html # Prevent hotlinking RewriteCond %{HTTP_REFERER} !^$ RewriteCond %{HTTP_REFERER} !^http://(www\.)?serpentineseo.com/.*$ [NC] RewriteRule \.(gif|jpg|png)$ http://www.serpentineseo.com/images/angryman.png [R,L] # Prevent offline browsers RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR] RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:[email protected] [OR] RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR] RewriteCond %{HTTP_USER_AGENT} ^Custo [OR] RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR] RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR] RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR] RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR] RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR] RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR] RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR] RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR] RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR] RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR] RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR] RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR] RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR] RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR] RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR] RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR] RewriteCond %{HTTP_USER_AGENT} ^HMView [OR] RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR] RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR] RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR] RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR] RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR] RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR] RewriteCond %{HTTP_USER_AGENT} ^larbin [OR] RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR] RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR] RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR] RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR] RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR] RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR] RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR] RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR] RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR] RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR] RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR] RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR] RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR] RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR] RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR] RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR] RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR] RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR] RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR] RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR] RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR] RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR] RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR] RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR] RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR] RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR] RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR] RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR] RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR] RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR] RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR] RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR] RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR] RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR] RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR] RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR] RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR] RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR] RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR] RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR] RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR] RewriteCond %{HTTP_USER_AGENT} ^Wget [OR] RewriteCond %{HTTP_USER_AGENT} ^Widow [OR] RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR] RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR] RewriteCond %{HTTP_USER_AGENT} ^Zeus RewriteRule ^.*$ http://www.google.com [R,L] # Protect against DOS attacks by limiting file upload size LimitRequestBody 10240000 # Deny access to sensitive files <FilesMatch "\.(htaccess|psd|log)$"> Order Allow,Deny Deny from all </FilesMatch>

    Read the article

  • .htaccess template, suggestions needed

    - by purpler
    DefaultLanguage en-US FileETag None Header unset ETag ServerSignature Off SetEnv TZ Europe/Belgrade # Rewrites Options +FollowSymLinks RewriteEngine On RewriteBase / # Redirect to WWW RewriteCond %{HTTP_HOST} ^serpentineseo.com RewriteRule (.*) http://www.serpentineseo.com/$1 [R=301,L] # Redirect index to root RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index\.html\ HTTP/ RewriteRule ^(.*)index\.html$ /$1 [R=301,L] # Cache media files: ExpiresActive On ExpiresDefault A0 # Month <filesMatch "\.(gif|jpg|jpeg|png|ico|swf|js)$"> Header set Cache-Control "max-age=2592000, public" </filesMatch> # Week <FilesMatch "\.(css|pdf)$"> Header set Cache-Control "max-age=604800" </FilesMatch> # 10 Min <FilesMatch "\.(html|htm|txt)$"> Header set Cache-Control "max-age=600" </FilesMatch> # Do not cache <FilesMatch "\.(pl|php|cgi|spl|scgi|fcgi)$"> Header unset Cache-Control </FilesMatch> # Compress output <IfModule mod_deflate.c> <FilesMatch "\.(html|js|css)$"> SetOutputFilter DEFLATE </FilesMatch> </IfModule> # Error Documents ErrorDocument 206 /error/206.html ErrorDocument 401 /error/401.html ErrorDocument 403 /error/403.html ErrorDocument 404 /error/404.html ErrorDocument 500 /error/500.html # Prevent hotlinking RewriteCond %{HTTP_REFERER} !^$ RewriteCond %{HTTP_REFERER} !^http://(www\.)?serpentineseo.com/.*$ [NC] RewriteRule \.(gif|jpg|png)$ http://www.serpentineseo.com/images/angryman.png [R,L] # Prevent offline browsers RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR] RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:[email protected] [OR] RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR] RewriteCond %{HTTP_USER_AGENT} ^Custo [OR] RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR] RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR] RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR] RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR] RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR] RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR] RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR] RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR] RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR] RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR] RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR] RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR] RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR] RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR] RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR] RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR] RewriteCond %{HTTP_USER_AGENT} ^HMView [OR] RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR] RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR] RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR] RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR] RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR] RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR] RewriteCond %{HTTP_USER_AGENT} ^larbin [OR] RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR] RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR] RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR] RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR] RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR] RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR] RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR] RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR] RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR] RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR] RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR] RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR] RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR] RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR] RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR] RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR] RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR] RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR] RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR] RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR] RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR] RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR] RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR] RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR] RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR] RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR] RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR] RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR] RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR] RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR] RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR] RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR] RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR] RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR] RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR] RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR] RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR] RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR] RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR] RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR] RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR] RewriteCond %{HTTP_USER_AGENT} ^Wget [OR] RewriteCond %{HTTP_USER_AGENT} ^Widow [OR] RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR] RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR] RewriteCond %{HTTP_USER_AGENT} ^Zeus RewriteRule ^.*$ http://www.google.com [R,L] # Protect against DOS attacks by limiting file upload size LimitRequestBody 10240000 # Deny access to sensitive files <FilesMatch "\.(htaccess|psd|log)$"> Order Allow,Deny Deny from all </FilesMatch>

    Read the article

  • .htaccess template, suggestions needed.

    - by purpler
    I compiled myself a .htaccess template and would like to know whether the caching and compressions is set up right, constructive suggestions and critics needed. # Defaults AddDefaultCharset UTF-8 DefaultLanguage en-US FileETag None Header unset ETag ServerSignature Off SetEnv TZ Europe/Belgrade # Rewrites Options +FollowSymLinks RewriteEngine On RewriteBase / # Redirect to WWW RewriteCond %{HTTP_HOST} ^serpentineseo.com RewriteRule (.*) http://www.serpentineseo.com/$1 [R=301,L] # Redirect index to root RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index\.html\ HTTP/ RewriteRule ^(.*)index\.html$ /$1 [R=301,L] # Cache media files: ExpiresActive On ExpiresDefault A0 # Month <filesMatch "\.(gif|jpg|jpeg|png|ico|swf|js)$"> Header set Cache-Control "max-age=2592000, public" </filesMatch> # Week <FilesMatch "\.(css|pdf)$"> Header set Cache-Control "max-age=604800" </FilesMatch> # 10 Min <FilesMatch "\.(html|htm|txt)$"> Header set Cache-Control "max-age=600" </FilesMatch> # Do not cache <FilesMatch "\.(pl|php|cgi|spl|scgi|fcgi)$"> Header unset Cache-Control </FilesMatch> # Compress output <IfModule mod_deflate.c> <FilesMatch "\.(html|js|css)$"> SetOutputFilter DEFLATE </FilesMatch> </IfModule> # Error Documents ErrorDocument 206 /error/206.html ErrorDocument 401 /error/401.html ErrorDocument 403 /error/403.html ErrorDocument 404 /error/404.html ErrorDocument 500 /error/500.html # Prevent hotlinking RewriteCond %{HTTP_REFERER} !^$ RewriteCond %{HTTP_REFERER} !^http://(www\.)?serpentineseo.com/.*$ [NC] RewriteRule \.(gif|jpg|png)$ http://www.serpentineseo.com/images/angryman.png [R,L] # Prevent offline browsers RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR] RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:[email protected] [OR] RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR] RewriteCond %{HTTP_USER_AGENT} ^Custo [OR] RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR] RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR] RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR] RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR] RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR] RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR] RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR] RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR] RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR] RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR] RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR] RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR] RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR] RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR] RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR] RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR] RewriteCond %{HTTP_USER_AGENT} ^HMView [OR] RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR] RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR] RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR] RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR] RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR] RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR] RewriteCond %{HTTP_USER_AGENT} ^larbin [OR] RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR] RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR] RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR] RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR] RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR] RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR] RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR] RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR] RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR] RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR] RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR] RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR] RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR] RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR] RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR] RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR] RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR] RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR] RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR] RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR] RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR] RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR] RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR] RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR] RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR] RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR] RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR] RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR] RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR] RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR] RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR] RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR] RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR] RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR] RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR] RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR] RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR] RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR] RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR] RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR] RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR] RewriteCond %{HTTP_USER_AGENT} ^Wget [OR] RewriteCond %{HTTP_USER_AGENT} ^Widow [OR] RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR] RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR] RewriteCond %{HTTP_USER_AGENT} ^Zeus RewriteRule ^.*$ http://www.google.com [R,L] # Protect against DOS attacks by limiting file upload size LimitRequestBody 10240000 # Deny access to sensitive files <FilesMatch "\.(htaccess|psd|log)$"> Order Allow,Deny Deny from all </FilesMatch>

    Read the article

  • website connection reset on first load

    - by Tar
    i'm using nginx with php-cgi. lately a problem has arose where if you don't view my site for a while, like 3-4 minutes, and then open it again, the first request you send will return connection reset by peer in the browser. if you refresh, operation is normal for all subsequent requests. this happens every time and it isn't just an isolated incident, it happens to everyone using my site. i've tried to restart nginx and php-cgi but to no avail. does anyone know what the problem could be? i can provide whatever information necessary. it's worth noting that there's nothing in error log besides that message about client closing the connection early. nginx.conf user nobody; worker_processes 4; error_log /var/log/nginx/error.log; pid /var/run/nginx.pid; events { worker_connections 2048; } http { include /etc/nginx/mime.types; error_page 404 /404.html; error_page 403 /403.html; error_page 444 /444.html; error_page 502 /502.html; default_type application/octet-stream; log_format main '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for"'; access_log /var/log/nginx/access.log main; large_client_header_buffers 8 8k; sendfile on; tcp_nopush on; tcp_nodelay on; keepalive_timeout 30; server_tokens off; gzip on; gzip_proxied any; gzip_comp_level 6; gzip_buffers 64 8k; gzip_min_length 1024; gzip_http_version 1.1; gzip_types text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript; include /etc/nginx/conf.d/*.conf; } default.conf server { listen 80; server_name domain.com; error_log /var/log/nginx/error.log debug; access_log /var/log/nginx/access.log; location / { if ($request_method !~ ^(GET|HEAD|POST)$ ) { return 444; } if ($http_user_agent ~* Havij|hvj|acunetix|wget|HTtrack) { return 403; } root /home/admin06/public_html; autoindex off; index index.php; # Images and static content is treated different location ~* ^.+.(jpg|jpeg|gif|css|png|js|ico|xml)$ { access_log off; expires 30d; root /home/admin06/public_html; } location /nginx_status { stub_status on; access_log off;] deny all; } location ~ \.php$ { fastcgi_split_path_info ^(.+\.php)(/.*)$; #try_files $uri =404; fastcgi_pass backend; fastcgi_index index.php; fastcgi_param SCRIPT_FILENAME /home/site/public_html$fastcgi_script_name; include fastcgi_params; fastcgi_param QUERY_STRING $query_string; fastcgi_param REQUEST_METHOD $request_method; fastcgi_param CONTENT_TYPE $content_type; fastcgi_param CONTENT_LENGTH $content_length; fastcgi_intercept_errors on; fastcgi_ignore_client_abort off; fastcgi_connect_timeout 60; fastcgi_send_timeout 60; fastcgi_read_timeout 60; fastcgi_buffer_size 128k; fastcgi_buffers 4 256k; fastcgi_busy_buffers_size 256k; fastcgi_temp_file_write_size 256k; } ## Disable viewing .htaccess & .htpassword location ~ /\.ht { deny all; } location ~ error_log { deny all; } location ~ access_log { deny all; } location ~ \.cgi { deny all; } location ~ \.db { deny all; } }

    Read the article

  • Using a mounted NTFS share with nginx

    - by Hoff
    I have set up a local testing VM with Ubuntu Server 12.04 LTS and the LEMP stack. It's kind of an unconventional setup because instead of having all my PHP scripts on the local machine, I've mounted an NTFS share as the document root because I do my development on Windows. I had everything working perfectly up until this morning, now I keep getting a dreaded 'File not found.' error. I am almost certain this must be somehow permission related, because if I copy my site over to /var/www, nginx and php-fpm have no problems serving my PHP scripts. What I can't figure out is why all of a sudden (after a reboot of the server), no PHP files will be served but instead just the 'File not found.' error. Static files work fine, so I think it's PHP that is causing the headache. Both nginx and php-fpm are configured to run as the user www-data: root@ubuntu-server:~# ps aux | grep 'nginx\|php-fpm' root 1095 0.0 0.0 5816 792 ? Ss 11:11 0:00 nginx: master process /opt/nginx/sbin/nginx -c /etc/nginx/nginx.conf www-data 1096 0.0 0.1 6016 1172 ? S 11:11 0:00 nginx: worker process www-data 1098 0.0 0.1 6016 1172 ? S 11:11 0:00 nginx: worker process root 1130 0.0 0.4 175560 4212 ? Ss 11:11 0:00 php-fpm: master process (/etc/php5/php-fpm.conf) www-data 1131 0.0 0.3 175560 3216 ? S 11:11 0:00 php-fpm: pool www www-data 1132 0.0 0.3 175560 3216 ? S 11:11 0:00 php-fpm: pool www www-data 1133 0.0 0.3 175560 3216 ? S 11:11 0:00 php-fpm: pool www root 1686 0.0 0.0 4368 816 pts/1 S+ 11:11 0:00 grep --color=auto nginx\|php-fpm I have mounted the NTFS share at /mnt/webfiles by editing /etc/fstab and adding the following line: //192.168.0.199/c$/Websites/ /mnt/webfiles cifs username=Jordan,password=mypasswordhere,gid=33,uid=33 0 0 Where gid 33 is the www-data group and uid 33 is the user www-data. If I list the contents of one of my sites you can in fact see that they belong to the user www-data: root@ubuntu-server:~# ls -l /mnt/webfiles/nTv5-2.0 total 8 drwxr-xr-x 0 www-data www-data 0 Jun 6 19:12 app drwxr-xr-x 0 www-data www-data 0 Aug 22 19:00 assets -rwxr-xr-x 0 www-data www-data 1150 Jan 4 2012 favicon.ico -rwxr-xr-x 0 www-data www-data 1412 Dec 28 2011 index.php drwxr-xr-x 0 www-data www-data 0 Jun 3 16:44 lib drwxr-xr-x 0 www-data www-data 0 Jan 3 2012 plugins drwxr-xr-x 0 www-data www-data 0 Jun 3 16:45 vendors If I switch to the www-data user, I have no problem creating a new file on the share: root@ubuntu-server:~# su www-data $ > /mnt/webfiles/test.txt $ ls -l /mnt/webfiles | grep test\.txt -rwxr-xr-x 0 www-data www-data 0 Sep 8 11:19 test.txt There should be no problem reading or writing to the share with php-fpm running as the user www-data. When I examine the error log of nginx, it's filled with a bunch of lines that look like the following: 2012/09/08 11:22:36 [error] 1096#0: *1 FastCGI sent in stderr: "Primary script unknown" while reading response header from upstream, client: 192.168.0.199, server: , request: "GET / HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "192.168.0.123" 2012/09/08 11:22:39 [error] 1096#0: *1 FastCGI sent in stderr: "Primary script unknown" while reading response header from upstream, client: 192.168.0.199, server: , request: "GET /apc.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "192.168.0.123" It's bizarre that this was working previously and now all of sudden PHP is complaining that it can't "find" the scripts on the share. Does anybody know why this is happening? EDIT I tried editing php-fpm.conf and changing chdir to the following: chdir = /mnt/webfiles When I try and restart the php-fpm service, I get the error: Starting php-fpm [08-Sep-2012 14:20:55] ERROR: [pool www] the chdir path '/mnt/webfiles' does not exist or is not a directory This is a total load of bullshit because this directory DOES exist and is mounted! Any ls commands to list that directory work perfectly. Why the hell can't PHP-FPM see this directory?! Here are my configuration files for reference: nginx.conf user www-data; worker_processes 2; error_log /var/log/nginx/nginx.log info; pid /var/run/nginx.pid; events { worker_connections 1024; multi_accept on; } http { include fastcgi.conf; include mime.types; default_type application/octet-stream; set_real_ip_from 127.0.0.1; real_ip_header X-Forwarded-For; ## Proxy proxy_redirect off; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; client_max_body_size 32m; client_body_buffer_size 128k; proxy_connect_timeout 90; proxy_send_timeout 90; proxy_read_timeout 90; proxy_buffers 32 4k; ## Compression gzip on; gzip_types text/plain text/css application/x-javascript text/xml application/xml application/xml+rss text/javascript; gzip_disable "MSIE [1-6]\.(?!.*SV1)"; ### TCP options tcp_nodelay on; tcp_nopush on; keepalive_timeout 65; sendfile on; include /etc/nginx/sites-enabled/*; } my site config server { listen 80; access_log /var/log/nginx/$host.access.log; error_log /var/log/nginx/error.log; root /mnt/webfiles/nTv5-2.0/app/webroot; index index.php; ## Block bad bots if ($http_user_agent ~* (HTTrack|HTMLParser|libcurl|discobot|Exabot|Casper|kmccrew|plaNETWORK|RPT-HTTPClient)) { return 444; } ## Block certain Referers (case insensitive) if ($http_referer ~* (sex|vigra|viagra) ) { return 444; } ## Deny dot files: location ~ /\. { deny all; } ## Favicon Not Found location = /favicon.ico { access_log off; log_not_found off; } ## Robots.txt Not Found location = /robots.txt { access_log off; log_not_found off; } if (-f $document_root/maintenance.html) { rewrite ^(.*)$ /maintenance.html last; } location ~* \.(?:ico|css|js|gif|jpe?g|png)$ { # Some basic cache-control for static files to be sent to the browser expires max; add_header Pragma public; add_header Cache-Control "max-age=2678400, public, must-revalidate"; } location / { try_files $uri $uri/ index.php; if (-f $request_filename) { break; } rewrite ^(.+)$ /index.php?url=$1 last; } location ~ \.php$ { include /etc/nginx/fastcgi.conf; fastcgi_pass unix:/var/run/php5-fpm.sock; } } php-fpm.conf ;;;;;;;;;;;;;;;;;;;;; ; FPM Configuration ; ;;;;;;;;;;;;;;;;;;;;; ; All relative paths in this configuration file are relative to PHP's install ; prefix (/opt/php5). This prefix can be dynamicaly changed by using the ; '-p' argument from the command line. ; Include one or more files. If glob(3) exists, it is used to include a bunch of ; files from a glob(3) pattern. This directive can be used everywhere in the ; file. ; Relative path can also be used. They will be prefixed by: ; - the global prefix if it's been set (-p arguement) ; - /opt/php5 otherwise ;include=etc/fpm.d/*.conf ;;;;;;;;;;;;;;;;;; ; Global Options ; ;;;;;;;;;;;;;;;;;; [global] ; Pid file ; Note: the default prefix is /opt/php5/var ; Default Value: none pid = /var/run/php-fpm.pid ; Error log file ; Note: the default prefix is /opt/php5/var ; Default Value: log/php-fpm.log error_log = /var/log/php5-fpm/php-fpm.log ; Log level ; Possible Values: alert, error, warning, notice, debug ; Default Value: notice ;log_level = notice ; If this number of child processes exit with SIGSEGV or SIGBUS within the time ; interval set by emergency_restart_interval then FPM will restart. A value ; of '0' means 'Off'. ; Default Value: 0 ;emergency_restart_threshold = 0 ; Interval of time used by emergency_restart_interval to determine when ; a graceful restart will be initiated. This can be useful to work around ; accidental corruptions in an accelerator's shared memory. ; Available Units: s(econds), m(inutes), h(ours), or d(ays) ; Default Unit: seconds ; Default Value: 0 ;emergency_restart_interval = 0 ; Time limit for child processes to wait for a reaction on signals from master. ; Available units: s(econds), m(inutes), h(ours), or d(ays) ; Default Unit: seconds ; Default Value: 0 ;process_control_timeout = 0 ; Send FPM to background. Set to 'no' to keep FPM in foreground for debugging. ; Default Value: yes ;daemonize = yes ;;;;;;;;;;;;;;;;;;;; ; Pool Definitions ; ;;;;;;;;;;;;;;;;;;;; ; Multiple pools of child processes may be started with different listening ; ports and different management options. The name of the pool will be ; used in logs and stats. There is no limitation on the number of pools which ; FPM can handle. Your system will tell you anyway :) ; Start a new pool named 'www'. ; the variable $pool can we used in any directive and will be replaced by the ; pool name ('www' here) [www] ; Per pool prefix ; It only applies on the following directives: ; - 'slowlog' ; - 'listen' (unixsocket) ; - 'chroot' ; - 'chdir' ; - 'php_values' ; - 'php_admin_values' ; When not set, the global prefix (or /opt/php5) applies instead. ; Note: This directive can also be relative to the global prefix. ; Default Value: none ;prefix = /path/to/pools/$pool ; The address on which to accept FastCGI requests. ; Valid syntaxes are: ; 'ip.add.re.ss:port' - to listen on a TCP socket to a specific address on ; a specific port; ; 'port' - to listen on a TCP socket to all addresses on a ; specific port; ; '/path/to/unix/socket' - to listen on a unix socket. ; Note: This value is mandatory. ;listen = 127.0.0.1:9000 listen = /var/run/php5-fpm.sock ; Set listen(2) backlog. A value of '-1' means unlimited. ; Default Value: 128 (-1 on FreeBSD and OpenBSD) ;listen.backlog = -1 ; List of ipv4 addresses of FastCGI clients which are allowed to connect. ; Equivalent to the FCGI_WEB_SERVER_ADDRS environment variable in the original ; PHP FCGI (5.2.2+). Makes sense only with a tcp listening socket. Each address ; must be separated by a comma. If this value is left blank, connections will be ; accepted from any ip address. ; Default Value: any ;listen.allowed_clients = 127.0.0.1 ; Set permissions for unix socket, if one is used. In Linux, read/write ; permissions must be set in order to allow connections from a web server. Many ; BSD-derived systems allow connections regardless of permissions. ; Default Values: user and group are set as the running user ; mode is set to 0666 ;listen.owner = www-data ;listen.group = www-data ;listen.mode = 0666 ; Unix user/group of processes ; Note: The user is mandatory. If the group is not set, the default user's group ; will be used. user = www-data group = www-data ; Choose how the process manager will control the number of child processes. ; Possible Values: ; static - a fixed number (pm.max_children) of child processes; ; dynamic - the number of child processes are set dynamically based on the ; following directives: ; pm.max_children - the maximum number of children that can ; be alive at the same time. ; pm.start_servers - the number of children created on startup. ; pm.min_spare_servers - the minimum number of children in 'idle' ; state (waiting to process). If the number ; of 'idle' processes is less than this ; number then some children will be created. ; pm.max_spare_servers - the maximum number of children in 'idle' ; state (waiting to process). If the number ; of 'idle' processes is greater than this ; number then some children will be killed. ; Note: This value is mandatory. pm = dynamic ; The number of child processes to be created when pm is set to 'static' and the ; maximum number of child processes to be created when pm is set to 'dynamic'. ; This value sets the limit on the number of simultaneous requests that will be ; served. Equivalent to the ApacheMaxClients directive with mpm_prefork. ; Equivalent to the PHP_FCGI_CHILDREN environment variable in the original PHP ; CGI. ; Note: Used when pm is set to either 'static' or 'dynamic' ; Note: This value is mandatory. pm.max_children = 50 ; The number of child processes created on startup. ; Note: Used only when pm is set to 'dynamic' ; Default Value: min_spare_servers + (max_spare_servers - min_spare_servers) / 2 pm.start_servers = 20 ; The desired minimum number of idle server processes. ; Note: Used only when pm is set to 'dynamic' ; Note: Mandatory when pm is set to 'dynamic' pm.min_spare_servers = 5 ; The desired maximum number of idle server processes. ; Note: Used only when pm is set to 'dynamic' ; Note: Mandatory when pm is set to 'dynamic' pm.max_spare_servers = 35 ; The number of requests each child process should execute before respawning. ; This can be useful to work around memory leaks in 3rd party libraries. For ; endless request processing specify '0'. Equivalent to PHP_FCGI_MAX_REQUESTS. ; Default Value: 0 pm.max_requests = 500 ; The URI to view the FPM status page. If this value is not set, no URI will be ; recognized as a status page. By default, the status page shows the following ; information: ; accepted conn - the number of request accepted by the pool; ; pool - the name of the pool; ; process manager - static or dynamic; ; idle processes - the number of idle processes; ; active processes - the number of active processes; ; total processes - the number of idle + active processes. ; max children reached - number of times, the process limit has been reached, ; when pm tries to start more children (works only for ; pm 'dynamic') ; The values of 'idle processes', 'active processes' and 'total processes' are ; updated each second. The value of 'accepted conn' is updated in real time. ; Example output: ; accepted conn: 12073 ; pool: www ; process manager: static ; idle processes: 35 ; active processes: 65 ; total processes: 100 ; max children reached: 1 ; By default the status page output is formatted as text/plain. Passing either ; 'html' or 'json' as a query string will return the corresponding output ; syntax. Example: ; http://www.foo.bar/status ; http://www.foo.bar/status?json ; http://www.foo.bar/status?html ; Note: The value must start with a leading slash (/). The value can be ; anything, but it may not be a good idea to use the .php extension or it ; may conflict with a real PHP file. ; Default Value: not set pm.status_path = /status ; The ping URI to call the monitoring page of FPM. If this value is not set, no ; URI will be recognized as a ping page. This could be used to test from outside ; that FPM is alive and responding, or to ; - create a graph of FPM availability (rrd or such); ; - remove a server from a group if it is not responding (load balancing); ; - trigger alerts for the operating team (24/7). ; Note: The value must start with a leading slash (/). The value can be ; anything, but it may not be a good idea to use the .php extension or it ; may conflict with a real PHP file. ; Default Value: not set ping.path = /ping ; This directive may be used to customize the response of a ping request. The ; response is formatted as text/plain with a 200 response code. ; Default Value: pong ping.response = pong ; The timeout for serving a single request after which the worker process will ; be killed. This option should be used when the 'max_execution_time' ini option ; does not stop script execution for some reason. A value of '0' means 'off'. ; Available units: s(econds)(default), m(inutes), h(ours), or d(ays) ; Default Value: 0 ;request_terminate_timeout = 0 ; The timeout for serving a single request after which a PHP backtrace will be ; dumped to the 'slowlog' file. A value of '0s' means 'off'. ; Available units: s(econds)(default), m(inutes), h(ours), or d(ays) ; Default Value: 0 ;request_slowlog_timeout = 0 ; The log file for slow requests ; Default Value: not set ; Note: slowlog is mandatory if request_slowlog_timeout is set ;slowlog = log/$pool.log.slow ; Set open file descriptor rlimit. ; Default Value: system defined value ;rlimit_files = 1024 ; Set max core size rlimit. ; Possible Values: 'unlimited' or an integer greater or equal to 0 ; Default Value: system defined value ;rlimit_core = 0 ; Chroot to this directory at the start. This value must be defined as an ; absolute path. When this value is not set, chroot is not used. ; Note: you can prefix with '$prefix' to chroot to the pool prefix or one ; of its subdirectories. If the pool prefix is not set, the global prefix ; will be used instead. ; Note: chrooting is a great security feature and should be used whenever ; possible. However, all PHP paths will be relative to the chroot ; (error_log, sessions.save_path, ...). ; Default Value: not set ;chroot = ; Chdir to this directory at the start. ; Note: relative path can be used. ; Default Value: current directory or / when chroot ;chdir = /var/www ; Redirect worker stdout and stderr into main error log. If not set, stdout and ; stderr will be redirected to /dev/null according to FastCGI specs. ; Note: on highloaded environement, this can cause some delay in the page ; process time (several ms). ; Default Value: no ;catch_workers_output = yes ; Pass environment variables like LD_LIBRARY_PATH. All $VARIABLEs are taken from ; the current environment. ; Default Value: clean env ;env[HOSTNAME] = $HOSTNAME ;env[PATH] = /usr/local/bin:/usr/bin:/bin ;env[TMP] = /tmp ;env[TMPDIR] = /tmp ;env[TEMP] = /tmp ; Additional php.ini defines, specific to this pool of workers. These settings ; overwrite the values previously defined in the php.ini. The directives are the ; same as the PHP SAPI: ; php_value/php_flag - you can set classic ini defines which can ; be overwritten from PHP call 'ini_set'. ; php_admin_value/php_admin_flag - these directives won't be overwritten by ; PHP call 'ini_set' ; For php_*flag, valid values are on, off, 1, 0, true, false, yes or no. ; Defining 'extension' will load the corresponding shared extension from ; extension_dir. Defining 'disable_functions' or 'disable_classes' will not ; overwrite previously defined php.ini values, but will append the new value ; instead. ; Note: path INI options can be relative and will be expanded with the prefix ; (pool, global or /opt/php5) ; Default Value: nothing is defined by default except the values in php.ini and ; specified at startup with the -d argument ;php_admin_value[sendmail_path] = /usr/sbin/sendmail -t -i -f [email protected] ;php_flag[display_errors] = off ;php_admin_value[error_log] = /var/log/fpm-php.www.log ;php_admin_flag[log_errors] = on ;php_admin_value[memory_limit] = 32M php_admin_value[sendmail_path] = /usr/sbin/sendmail -t -i

    Read the article

1