Search Results

Search found 33668 results on 1347 pages for 'html to txt'.

Page 1/1347 | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • .NET HTML Sanitation for rich HTML Input

    - by Rick Strahl
    Recently I was working on updating a legacy application to MVC 4 that included free form text input. When I set up the new site my initial approach was to not allow any rich HTML input, only simple text formatting that would respect a few simple HTML commands for bold, lists etc. and automatically handles line break processing for new lines and paragraphs. This is typical for what I do with most multi-line text input in my apps and it works very well with very little development effort involved. Then the client sprung another note: Oh by the way we have a bunch of customers (real estate agents) who need to post complete HTML documents. Oh uh! There goes the simple theory. After some discussion and pleading on my part (<snicker>) to try and avoid this type of raw HTML input because of potential XSS issues, the client decided to go ahead and allow raw HTML input anyway. There has been lots of discussions on this subject on StackOverFlow (and here and here) but to after reading through some of the solutions I didn't really find anything that would work even closely for what I needed. Specifically we need to be able to allow just about any HTML markup, with the exception of script code. Remote CSS and Images need to be loaded, links need to work and so. While the 'legit' HTML posted by these agents is basic in nature it does span most of the full gamut of HTML (4). Most of the solutions XSS prevention/sanitizer solutions I found were way to aggressive and rendered the posted output unusable mostly because they tend to strip any externally loaded content. In short I needed a custom solution. I thought the best solution to this would be to use an HTML parser - in this case the Html Agility Pack - and then to run through all the HTML markup provided and remove any of the blacklisted tags and a number of attributes that are prone to JavaScript injection. There's much discussion on whether to use blacklists vs. whitelists in the discussions mentioned above, but I found that whitelists can make sense in simple scenarios where you might allow manual HTML input, but when you need to allow a larger array of HTML functionality a blacklist is probably easier to manage as the vast majority of elements and attributes could be allowed. Also white listing gets a bit more complex with HTML5 and the new proliferation of new HTML tags and most new tags generally don't affect XSS issues directly. Pure whitelisting based on elements and attributes also doesn't capture many edge cases (see some of the XSS cheat sheets listed below) so even with a white list, custom logic is still required to handle many of those edge cases. The Microsoft Web Protection Library (AntiXSS) My first thought was to check out the Microsoft AntiXSS library. Microsoft has an HTML Encoding and Sanitation library in the Microsoft Web Protection Library (formerly AntiXSS Library) on CodePlex, which provides stricter functions for whitelist encoding and sanitation. Initially I thought the Sanitation class and its static members would do the trick for me,but I found that this library is way too restrictive for my needs. Specifically the Sanitation class strips out images and links which rendered the full HTML from our real estate clients completely useless. I didn't spend much time with it, but apparently I'm not alone if feeling this library is not really useful without some way to configure operation. To give you an example of what didn't work for me with the library here's a small and simple HTML fragment that includes script, img and anchor tags. I would expect the script to be stripped and everything else to be left intact. Here's the original HTML:var value = "<b>Here</b> <script>alert('hello')</script> we go. Visit the " + "<a href='http://west-wind.com'>West Wind</a> site. " + "<img src='http://west-wind.com/images/new.gif' /> " ; and the code to sanitize it with the AntiXSS Sanitize class:@Html.Raw(Microsoft.Security.Application.Sanitizer.GetSafeHtmlFragment(value)) This produced a not so useful sanitized string: Here we go. Visit the <a>West Wind</a> site. While it removed the <script> tag (good) it also removed the href from the link and the image tag altogether (bad). In some situations this might be useful, but for most tasks I doubt this is the desired behavior. While links can contain javascript: references and images can 'broadcast' information to a server, without configuration to tell the library what to restrict this becomes useless to me. I couldn't find any way to customize the white list, nor is there code available in this 'open source' library on CodePlex. Using Html Agility Pack for HTML Parsing The WPL library wasn't going to cut it. After doing a bit of research I decided the best approach for a custom solution would be to use an HTML parser and inspect the HTML fragment/document I'm trying to import. I've used the HTML Agility Pack before for a number of apps where I needed an HTML parser without requiring an instance of a full browser like the Internet Explorer Application object which is inadequate in Web apps. In case you haven't checked out the Html Agility Pack before, it's a powerful HTML parser library that you can use from your .NET code. It provides a simple, parsable HTML DOM model to full HTML documents or HTML fragments that let you walk through each of the elements in your document. If you've used the HTML or XML DOM in a browser before you'll feel right at home with the Agility Pack. Blacklist based HTML Parsing to strip XSS Code For my purposes of HTML sanitation, the process involved is to walk the HTML document one element at a time and then check each element and attribute against a blacklist. There's quite a bit of argument of what's better: A whitelist of allowed items or a blacklist of denied items. While whitelists tend to be more secure, they also require a lot more configuration. In the case of HTML5 a whitelist could be very extensive. For what I need, I only want to ensure that no JavaScript is executed, so a blacklist includes the obvious <script> tag plus any tag that allows loading of external content including <iframe>, <object>, <embed> and <link> etc. <form>  is also excluded to avoid posting content to a different location. I also disallow <head> and <meta> tags in particular for my case, since I'm only allowing posting of HTML fragments. There is also some internal logic to exclude some attributes or attributes that include references to JavaScript or CSS expressions. The default tag blacklist reflects my use case, but is customizable and can be added to. Here's my HtmlSanitizer implementation:using System.Collections.Generic; using System.IO; using System.Xml; using HtmlAgilityPack; namespace Westwind.Web.Utilities { public class HtmlSanitizer { public HashSet<string> BlackList = new HashSet<string>() { { "script" }, { "iframe" }, { "form" }, { "object" }, { "embed" }, { "link" }, { "head" }, { "meta" } }; /// <summary> /// Cleans up an HTML string and removes HTML tags in blacklist /// </summary> /// <param name="html"></param> /// <returns></returns> public static string SanitizeHtml(string html, params string[] blackList) { var sanitizer = new HtmlSanitizer(); if (blackList != null && blackList.Length > 0) { sanitizer.BlackList.Clear(); foreach (string item in blackList) sanitizer.BlackList.Add(item); } return sanitizer.Sanitize(html); } /// <summary> /// Cleans up an HTML string by removing elements /// on the blacklist and all elements that start /// with onXXX . /// </summary> /// <param name="html"></param> /// <returns></returns> public string Sanitize(string html) { var doc = new HtmlDocument(); doc.LoadHtml(html); SanitizeHtmlNode(doc.DocumentNode); //return doc.DocumentNode.WriteTo(); string output = null; // Use an XmlTextWriter to create self-closing tags using (StringWriter sw = new StringWriter()) { XmlWriter writer = new XmlTextWriter(sw); doc.DocumentNode.WriteTo(writer); output = sw.ToString(); // strip off XML doc header if (!string.IsNullOrEmpty(output)) { int at = output.IndexOf("?>"); output = output.Substring(at + 2); } writer.Close(); } doc = null; return output; } private void SanitizeHtmlNode(HtmlNode node) { if (node.NodeType == HtmlNodeType.Element) { // check for blacklist items and remove if (BlackList.Contains(node.Name)) { node.Remove(); return; } // remove CSS Expressions and embedded script links if (node.Name == "style") { if (string.IsNullOrEmpty(node.InnerText)) { if (node.InnerHtml.Contains("expression") || node.InnerHtml.Contains("javascript:")) node.ParentNode.RemoveChild(node); } } // remove script attributes if (node.HasAttributes) { for (int i = node.Attributes.Count - 1; i >= 0; i--) { HtmlAttribute currentAttribute = node.Attributes[i]; var attr = currentAttribute.Name.ToLower(); var val = currentAttribute.Value.ToLower(); span style="background: white; color: green">// remove event handlers if (attr.StartsWith("on")) node.Attributes.Remove(currentAttribute); // remove script links else if ( //(attr == "href" || attr== "src" || attr == "dynsrc" || attr == "lowsrc") && val != null && val.Contains("javascript:")) node.Attributes.Remove(currentAttribute); // Remove CSS Expressions else if (attr == "style" && val != null && val.Contains("expression") || val.Contains("javascript:") || val.Contains("vbscript:")) node.Attributes.Remove(currentAttribute); } } } // Look through child nodes recursively if (node.HasChildNodes) { for (int i = node.ChildNodes.Count - 1; i >= 0; i--) { SanitizeHtmlNode(node.ChildNodes[i]); } } } } } Please note: Use this as a starting point only for your own parsing and review the code for your specific use case! If your needs are less lenient than mine were you can you can make this much stricter by not allowing src and href attributes or CSS links if your HTML doesn't allow it. You can also check links for external URLs and disallow those - lots of options.  The code is simple enough to make it easy to extend to fit your use cases more specifically. It's also quite easy to make this code work using a WhiteList approach if you want to go that route. The code above is semi-generic for allowing full featured HTML fragments that only disallow script related content. The Sanitize method walks through each node of the document and then recursively drills into all of its children until the entire document has been traversed. Note that the code here uses an XmlTextWriter to write output - this is done to preserve XHTML style self-closing tags which are otherwise left as non-self-closing tags. The sanitizer code scans for blacklist elements and removes those elements not allowed. Note that the blacklist is configurable either in the instance class as a property or in the static method via the string parameter list. Additionally the code goes through each element's attributes and looks for a host of rules gleaned from some of the XSS cheat sheets listed at the end of the post. Clearly there are a lot more XSS vulnerabilities, but a lot of them apply to ancient browsers (IE6 and versions of Netscape) - many of these glaring holes (like CSS expressions - WTF IE?) have been removed in modern browsers. What a Pain To be honest this is NOT a piece of code that I wanted to write. I think building anything related to XSS is better left to people who have far more knowledge of the topic than I do. Unfortunately, I was unable to find a tool that worked even closely for me, or even provided a working base. For the project I was working on I had no choice and I'm sharing the code here merely as a base line to start with and potentially expand on for specific needs. It's sad that Microsoft Web Protection Library is currently such a train wreck - this is really something that should come from Microsoft as the systems vendor or possibly a third party that provides security tools. Luckily for my application we are dealing with a authenticated and validated users so the user base is fairly well known, and relatively small - this is not a wide open Internet application that's directly public facing. As I mentioned earlier in the post, if I had my way I would simply not allow this type of raw HTML input in the first place, and instead rely on a more controlled HTML input mechanism like MarkDown or even a good HTML Edit control that can provide some limits on what types of input are allowed. Alas in this case I was overridden and we had to go forward and allow *any* raw HTML posted. Sometimes I really feel sad that it's come this far - how many good applications and tools have been thwarted by fear of XSS (or worse) attacks? So many things that could be done *if* we had a more secure browser experience and didn't have to deal with every little script twerp trying to hack into Web pages and obscure browser bugs. So much time wasted building secure apps, so much time wasted by others trying to hack apps… We're a funny species - no other species manages to waste as much time, effort and resources as we humans do :-) Resources Code on GitHub Html Agility Pack XSS Cheat Sheet XSS Prevention Cheat Sheet Microsoft Web Protection Library (AntiXss) StackOverflow Links: http://stackoverflow.com/questions/341872/html-sanitizer-for-net http://blog.stackoverflow.com/2008/06/safe-html-and-xss/ http://code.google.com/p/subsonicforums/source/browse/trunk/SubSonic.Forums.Data/HtmlScrubber.cs?r=61© Rick Strahl, West Wind Technologies, 2005-2012Posted in Security  HTML  ASP.NET  JavaScript   Tweet !function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs"); (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/plusone.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

    Read the article

  • Install usblib package - Ubuntu

    - by Tom celic
    I need the package libusb for another package I am installing. I tried the following which seemed to install the package, sudo apt-get install libusb-dev but when I try to install the other package I get, configure: error: Package requirements (libusb-1.0 >= 0.9.1) were not met: No package 'libusb-1.0' found Consider adjusting the PKG_CONFIG_PATH environment variable if you installed software in a non-standard prefix. Alternatively, you may set the environment variables LIBUSB_CFLAGS and LIBUSB_LIBS to avoid the need to call pkg-config. See the pkg-config man page for more details. When I run the command dpkg -L libusb-dev, I get: /. /usr /usr/bin /usr/bin/libusb-config /usr/include /usr/include/usb.h /usr/lib /usr/lib/libusb.a /usr/lib/libusb.la /usr/lib/pkgconfig /usr/lib/pkgconfig/libusb.pc /usr/share /usr/share/doc /usr/share/doc/libusb-dev /usr/share/doc/libusb-dev/html /usr/share/doc/libusb-dev/html/index.html /usr/share/doc/libusb-dev/html/preface.html /usr/share/doc/libusb-dev/html/intro.html /usr/share/doc/libusb-dev/html/intro-overview.html /usr/share/doc/libusb-dev/html/intro-support.html /usr/share/doc/libusb-dev/html/api.html /usr/share/doc/libusb-dev/html/api-device-interfaces.html /usr/share/doc/libusb-dev/html/api-timeouts.html /usr/share/doc/libusb-dev/html/api-types.html /usr/share/doc/libusb-dev/html/api-synchronous.html /usr/share/doc/libusb-dev/html/api-return-values.html /usr/share/doc/libusb-dev/html/functions.html /usr/share/doc/libusb-dev/html/ref.core.html /usr/share/doc/libusb-dev/html/function.usbinit.html /usr/share/doc/libusb-dev/html/function.usbfindbusses.html /usr/share/doc/libusb-dev/html/function.usbfinddevices.html /usr/share/doc/libusb-dev/html/function.usbgetbusses.html /usr/share/doc/libusb-dev/html/ref.deviceops.html /usr/share/doc/libusb-dev/html/function.usbopen.html /usr/share/doc/libusb-dev/html/function.usbclose.html /usr/share/doc/libusb-dev/html/function.usbsetconfiguration.html /usr/share/doc/libusb-dev/html/function.usbsetaltinterface.html /usr/share/doc/libusb-dev/html/function.usbresetep.html /usr/share/doc/libusb-dev/html/function.usbclearhalt.html /usr/share/doc/libusb-dev/html/function.usbreset.html /usr/share/doc/libusb-dev/html/function.usbclaiminterface.html /usr/share/doc/libusb-dev/html/function.usbreleaseinterface.html /usr/share/doc/libusb-dev/html/ref.control.html /usr/share/doc/libusb-dev/html/function.usbcontrolmsg.html /usr/share/doc/libusb-dev/html/function.usbgetstring.html /usr/share/doc/libusb-dev/html/function.usbgetstringsimple.html /usr/share/doc/libusb-dev/html/function.usbgetdescriptor.html /usr/share/doc/libusb-dev/html/function.usbgetdescriptorbyendpoint.html /usr/share/doc/libusb-dev/html/ref.bulk.html /usr/share/doc/libusb-dev/html/function.usbbulkwrite.html /usr/share/doc/libusb-dev/html/function.usbbulkread.html /usr/share/doc/libusb-dev/html/ref.interrupt.html /usr/share/doc/libusb-dev/html/function.usbinterruptwrite.html /usr/share/doc/libusb-dev/html/function.usbinterruptread.html /usr/share/doc/libusb-dev/html/ref.nonportable.html /usr/share/doc/libusb-dev/html/function.usbgetdrivernp.html /usr/share/doc/libusb-dev/html/function.usbdetachkerneldrivernp.html /usr/share/doc/libusb-dev/html/examples.html /usr/share/doc/libusb-dev/html/examples-code.html /usr/share/doc/libusb-dev/html/examples-tests.html /usr/share/doc/libusb-dev/html/examples-other.html /usr/share/doc/libusb-dev/copyright /usr/share/doc-base /usr/share/doc-base/libusb-dev /usr/share/man /usr/share/man/man1 /usr/share/man/man1/libusb-config.1.gz /usr/lib/libusb.so /usr/share/doc/libusb-dev/changelog.Debian.gz Any ideas??

    Read the article

  • how to get this working if exist tasklist=="notepad %DATE%.txt" [closed]

    - by blade19899
    @echo off title Log file creator if exist "%CD%\%DATE%.txt" (msg * "the log of today(%DATE%.txt) Has already been made" goto :notepad) else (goto :start) goto :eof :start echo %date%>>"%date%".txt echo %username% >>"%date%".txt echo.>>"%date%".txt echo.>>"%date%".txt echo ----------------------------------------Reports---------------------------------------- >>"%date%".txt echo.>>"%date%".txt echo.>>"%date%".txt echo.>>"%date%".txt echo.>>"%date%".txt echo.>>"%date%".txt echo.>>"%date%".txt echo.>>"%date%".txt echo.>>"%date%".txt echo ----------------------------------------TO-DO---------------------- >>"%date%".txt echo.>>"%date%".txt echo.>>"%date%".txt :notepad if exist tasklist=="notepad %DATE%.txt" (msg * "the log of today(%DATE%.txt) Has already been made, en is opened") else (goto :start) goto :eof :start start notepad %DATE%.txt This code right now pops up the log of today(%DATE%.txt) Has already been made and after clicking OK it doesn't do anything it should open the msg the log of today(%DATE%.txt) Has already been made, en is opened i have notepad opened. with process explorer it shows notepad and the date.txt my question is how to make it show the the log of today(%DATE%.txt) Has already been made, en is opened box... and perhaps bring notepad to the foreground? ps not sure if this question belongs here. Apologize if it doesn't!

    Read the article

  • Robots.txt practices with .htaccess redirections (inherits)

    - by Jayhal
    I have a question regarding how to write robots.txt files for many domains and subdomains with redirects in place. We have a hosting account that enacts primary and add-on domains. All of our domains and subdomains, including the primary domain, is redirected via htaccess 301s to their own subdirectories in the primary domain's root directory. I'm confused about how I would write the robots.txt for certain directories. First, I wanted to confirm I am right in understanding that for domains and subdomains, crawlers will look to the directory that acts as that urls root directory for the crawling rules(robots.txt). Also, that a directory will not be affected by a robots.txt present in their parent directory if the directory has its own domain/subdomain, and that url is the one being accessed by crawlers. (Am pretty sure, but I wanted to confirm I didnt have a fundamentally flawed understanding of robots.txt) In the original root directory on the account(where the primary domain was directed before htaccess was put in place) what should the robots.txt contain? When crawlers look to crawl our primary domain, will they look to the original root directory for the robots.txt or will they reference the file contained in the new subdirectory where all the primary domain's site files are located? If so, what should the root's robot.txt include if anything at all. Would I be right to include a simple 'disallow: /' for all agents, and then include more specific robots.txt files in each subdirectory with more specific instructions. Would that affect the crawling of the directory where the primary domain is now redirected? Any help is greatly appreciated, Thanks!

    Read the article

  • Redirect Google crawler to different robots.txt via .htaccess

    - by user3474818
    I have googled for the answer all day and still couldn't find an answer. I have a virtual subdomain www.static.example.com which is a mirror site of www.example.com. It means I have just one root folder for subdomain and domain aswell. I want to redirect crawlers to different robots.txt file - robots_static.txt when they see .static in url in which I will forbid indexing via /disallow command. I want to do this because I have duplicated content in Google search results. Subdomain is showing the exact same content as the main domain. Does anyone know how could I achieve that crawlers sees robots_static.txt instead of robots.txt? What I have managed to find so far is this: RewriteCond %{HTTP_HOST} ^www.static.*$ [NC] RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*robots\.txt.*\ HTTP/ [NC] RewriteRule ^robots\.txt /robots_static.txt [NC,L] but when I check in webmaster tools, it still sees robots.txt as my robots file instead of robots_static.txt, so it crawls and index everything twice. What did I do wrong? Thanks EDIT: This is my .htaccess file ## # @package Joomla # @copyright Copyright (C) 2005 - 2013 Open Source Matters. All rights reserved. # @license GNU General Public License version 2 or later; see LICENSE.txt ## ## # READ THIS COMPLETELY IF YOU CHOOSE TO USE THIS FILE! # # The line just below this section: 'Options +FollowSymLinks' may cause problems # with some server configurations. It is required for use of mod_rewrite, but may already # be set by your server administrator in a way that dissallows changing it in # your .htaccess file. If using it causes your server to error out, comment it out (add # to # beginning of line), reload your site in your browser and test your sef url's. If they work, # it has been set by your server administrator and you do not need it set here. ## ## Can be commented out if causes errors, see notes above. Options +FollowSymLinks ## Mod_rewrite in use. RewriteEngine On RewriteEngine On RewriteCond %{HTTP_HOST} !^www\. RewriteRule ^(.*)$ http://www.%{HTTP_HOST}/$1 [R=301,L] RewriteCond %{HTTP_HOST} ^www.static.*$ [NC] RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*robots\.txt.*\ HTTP/ [NC] RewriteRule ^robots\.txt /robots_static.txt [NC,L] ## Begin - Rewrite rules to block out some common exploits. # If you experience problems on your site block out the operations listed below # This attempts to block the most common type of exploit `attempts` to Joomla! # # Block out any script trying to base64_encode data within the URL. RewriteCond %{QUERY_STRING} base64_encode[^(]*\([^)]*\) [OR] # Block out any script that includes a <script> tag in URL. RewriteCond %{QUERY_STRING} (<|%3C)([^s]*s)+cript.*(>|%3E) [NC,OR] # Block out any script trying to set a PHP GLOBALS variable via URL. RewriteCond %{QUERY_STRING} GLOBALS(=|\[|\%[0-9A-Z]{0,2}) [OR] # Block out any script trying to modify a _REQUEST variable via URL. RewriteCond %{QUERY_STRING} _REQUEST(=|\[|\%[0-9A-Z]{0,2}) # Return 403 Forbidden header and show the content of the root homepage RewriteRule .* index.php [F] # ## End - Rewrite rules to block out some common exploits. ## Begin - Custom redirects # # If you need to redirect some pages, or set a canonical non-www to # www redirect (or vice versa), place that code here. Ensure those # redirects use the correct RewriteRule syntax and the [R=301,L] flags. # ## End - Custom redirects ## # Uncomment following line if your webserver's URL # is not directly related to physical file paths. # Update Your Joomla! Directory (just / for root). ## # RewriteBase / RewriteCond %{THE_REQUEST} ^GET.*index\.php [NC] RewriteCond %{THE_REQUEST} !/system/.* RewriteRule (.*?)index\.php/*(.*) /$1$2 [R=301,L] RewriteCond %{THE_REQUEST} ^GET ## Begin - Joomla! core SEF Section. # RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}] # # If the requested path and file is not /index.php and the request # has not already been internally rewritten to the index.php script RewriteCond %{REQUEST_URI} !^/index\.php # and the request is for something within the component folder, # or for the site root, or for an extensionless URL, or the # requested URL ends with one of the listed extensions RewriteCond %{REQUEST_URI} /component/|(/[^.]*|\.(php|html?|feed|pdf|vcf|raw))$ [NC] # and the requested path and file doesn't directly match a physical file RewriteCond %{REQUEST_FILENAME} !-f # and the requested path and file doesn't directly match a physical folder RewriteCond %{REQUEST_FILENAME} !-d # internally rewrite the request to the index.php script RewriteRule .* index.php [L] # ## End - Joomla! core SEF Section. <FilesMatch "\.(ico|pdf|flv|jpg|ttf|jpg|jpeg|png|gif|js|css|swf)$"> Header set Expires "Wed, 15 Apr 2020 20:00:00 GMT" Header set Cache-Control "public" </FilesMatch> <ifModule mod_headers.c> Header set Connection keep-alive </ifModule> ########## Begin - Remove Etags # FileETag none # ########## End - Remove Etags

    Read the article

  • Robots.txt and "Bad" Robots

    - by Lynda
    I understand robots.txt and its purpose. I have read some people saying that using a Robots.txt gives "bad" robots or robots who do not obey a robots.txt a way to access pages on your site that you do not want accessed. While I am not looking to get into a debate about that I do have a question: If I have a structure like this: /Folder/ /Sub-Folder 1/ /Sub-Folder 2/ (Note: There are no pages within /Folder/ only other folders.) If I Disallow: /Folder/ it will prevent "good" robots from accessing the directory and any contents within the sub-folders. While we know that bad robots will see the /Folder/ will they be able to see and acess the sub-folders and the pages within the subfolders if they are not listed in the robots.txt? (Note: I do not fully understand how robots good or bad crawl a site beyond using a robots.txt and links within the site.)

    Read the article

  • Convert HTML template (HTML Code) into an image using php library [on hold]

    - by user2727841
    I'm taking input from user through tiny mce editor which is actually html template (HTML Code) and i want to convert that html template (code) into an image using php libaray, How to do it? Is there any API (SDK) OR library for it? well I prefered API (SDK) OR library which actually convert html template (code) into an image... I've searched every where but didn't succeed, now can any one tell me any php library which convert html code into an image... Thanks in advance

    Read the article

  • Prevent malicious vulnerability scan increasing load on a server

    - by Simon
    Hi all, this week we have been suffering some malicious vulnerability scans to our servers, increasing the load on them, making them nearly unusable. The attack is easy to defend, just blocking the offending ip, but only after discovering it. Is there any form of prevent it? Is it normal that one server becomes nearly unusable due to one of these scans? These are the requests done in just one second to our server: [Fri Mar 12 19:15:27 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/zope trunk 2 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/8872fcacd7663c040f0149ed49f572e9 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/188201 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/74e118780caa0f5232d6ec393b47ae01 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/87d4b821b2b6b9706ba6c2950c0eaefd [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/138917 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/180377 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/182712 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/compl2s [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/e7ba351f0ab1f32b532ec679ac7d589d [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/184530 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/compl_s [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/55542 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/7b9d5a65aab84640c6414a85cae2c6ff [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/77257 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/157611 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/textwrapping [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/51713 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/elina [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/fd4800093500f7a9cc21bea232658706 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/59719 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/administrationexamples [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/29587 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/bdebc9c4aa95b3651e9b8fd90c015327 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/defaultchangenotetext [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/figments [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/69744 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/fastpixelperfect [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/conchmusicsoundtoolkit [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/settingwindowposition [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/windowresizing [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/84784 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/186114 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/99858 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/131677 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/167783 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/99933 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/3en17ljttc [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/gradientcode [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/pythondevelopmentandnavigationwithspe [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/10546 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/167932 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/smallerrectforspritecollision [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/176292 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/3sumvid-19yroldfuckedby2bigcocks [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/67909 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/175185 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/131319 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/99900 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/act5 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/contributors-agreement [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/128447 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/71052 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/114242 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/69768 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/debuggingwithwinpdbfromwithinspe [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/39360 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/176267 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/143468 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/140202 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/25268 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/82241 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/142920 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/downloadingipythonformswindows [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/34367 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/for_collaborators [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/pydeveclipseextensionsfabio [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/usingpdbinipython [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/142264 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/49003 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/gamelets [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/texturecoordinatearithmetic [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/project_interface [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/143177 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/pydeveclipsefabio [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/91525 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/40426 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/134819 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/usingipythonwithtextpad [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/developingpythoninipython [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/35569 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/objfileloader [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/simpleopengl2dclasses [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/191495 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/3dvilla [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/145368 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/140118 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/87799 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/142320 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/glslexample [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/39826 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/cairopygame [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/191338 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/91819 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/152003 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/gllight [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/40567 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/137877 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/188209 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/84577 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/131017 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/fightnight [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/79781 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/4731669 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/161942 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/160289 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/81594 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/12127 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/164452 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/96823 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/163598 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/159190 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/step-test fsfs+ra_local [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/davros [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/step-publish logs [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/step-cleanup [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/step-test fsfs+ra_svn [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/cdrwin_v3 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/brianpensive [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/x86-openbsd shared gcc [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/roundup-0 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/svcastle [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/56584 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/45934 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/step-build [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/97194 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/cdrwin_3 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/72243 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/117043 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/147084 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/52713 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/101489 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/134867 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/win32-dependencies [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/36548 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/43827 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/100791 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/elita_posing [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/167848 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/36314 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/49951 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/142740 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/cdromkiteletronicaptg [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/138060 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/68483 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/184474 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/137447 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/sndarray [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/127870 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/167312 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/75411 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/167969 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/surfarray [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/174941 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/59129 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/147554 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/105577 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/91734 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/96679 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/06au [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/124495 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/aah [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/164439 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/12638190 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/eliel [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/171164 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/linearinterpolator [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/step-test [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/heading_news [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/87778 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/portlet_64568222 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/graphic_ep [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/132230 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/12251 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/greencheese [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/188966 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/cdsonic [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/171522 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/elitewrap [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/184313 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/188079 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/147511 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/160952 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/132581 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/84885 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/graphic_desktop [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/win32-xp vs2005 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/128548 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/92057 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/65235 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/pyscgi [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/56926 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/svcastle-big [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/138553 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/138232 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/153367 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/42315 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/150012 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/160079 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/win32-xp vc60 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/163482 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/42642 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/174458 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/163109 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/spacer_greys [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/pdf_icon16 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/26346 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/190998 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/fforigins [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/aliens-0 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/step-update faad [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/13376 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/52647 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/155036 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/compl2 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/174323 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/42317 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/tsugumo [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/171850 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/184127 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/48321 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/162545 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/84180 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/135901 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/57817 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/6360574 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/124989 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/113314 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/sprite-tutorial [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/14294 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/191387 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/187294 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/178666 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/179653 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/wingide-users [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/16309095 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/169465 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/189399 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/172392 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/35627 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/2670901 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/177847 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/chimplinebyline [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/87518 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/154595 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/12811780 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/cdmenupro42 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/110131 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/95615 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/18464 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/lwedchoice-1999 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/5099582 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/100968 [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/j-emacs [Fri Mar 12 19:15:28 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/0206mathew [Fri Mar 12 19:15:29 2010] [error] [client 213.37.49.231] File does not exist: /var/www/html/10844356 Thanks in advance!

    Read the article

  • How come the ls command prints in multiple columns on tty but only one column everywhere else?

    - by David Lou
    Even after using Unix-like OSes for a couple years, this behaviour still baffles me. When I use the ls command in a directory that has lots of files, the output is usually nicely formatted into multiple columns. Here's an example: $ ls a.txt C.txt f.txt H.txt k.txt M.txt p.txt R.txt u.txt W.txt z.txt A.txt d.txt F.txt i.txt K.txt n.txt P.txt s.txt U.txt x.txt Z.txt b.txt D.txt g.txt I.txt l.txt N.txt q.txt S.txt v.txt X.txt B.txt e.txt G.txt j.txt L.txt o.txt Q.txt t.txt V.txt y.txt c.txt E.txt h.txt J.txt m.txt O.txt r.txt T.txt w.txt Y.txt However, if I try to redirect the output to a file, or pipe it to another command, only a single column appears in the output. Using the same example directory as above, here's what I get when I pipe ls to wc: $ ls | wc 52 52 312 In other words, wc thinks there are 52 lines, even though the output to the terminal has only 5. I haven't observed this behaviour in any other command. Would you like to explain this to me?

    Read the article

  • Need assistance making a batch file for renaming files in separate folders

    - by Carnaxus
    Ok, here's one for you. I'm trying to use a batch file to rename a bunch of files, but none of them are in the same folder as the batch file itself. The command prompt keeps telling me that the directory can't be found. I suppose I could just rename all the files in all the folders that match the filename, but I don't want to do that either; I only want to change certain ones. My batch file as it stands is: @echo off ren "engine/info.txt" "disabled.txt" ren "gravplating/info.txt" "disabled.txt" ren "HAWX content/info.txt" "disabled.txt" ren "laserz/info.txt" "disabled.txt" ren "NeuroNaval/info.txt" "disabled.txt" ren "NeuroPlanes/info.txt" "disabled.txt" ren "NeuroTanks/info.txt" "disabled.txt" ren "NeuroWeapons/info.txt" "disabled.txt" ren "WAC Base/info.txt" "disabled.txt" ren "WAC DamageSystem/info.txt" "disabled.txt" ren "WAC GravityController/info.txt" "disabled.txt" ren "WAC Helicopters/info.txt" "disabled.txt" ren "WAC Sweps/info.txt" "disabled.txt" ren "weapons/info.txt" "disabled.txt" ren "AFF_ships/info.txt" "disabled.txt" ren "AntiTakeRifle/info.txt" "disabled.txt" ren "Catmull-Rom Cameras/info.txt" "disabled.txt" ren "Displacer Cannon/info.txt" "disabled.txt" ren "Drumdevil's Trains/info.txt" "disabled.txt" ren "EVEOnline/info.txt" "disabled.txt" ren "gm_botmap_v3/info.txt" "disabled.txt" ren "gm_construct_flatgrass_v5-2/info.txt" "disabled.txt" ren "gm_mobenix_v3_final/info.txt" "disabled.txt" ren "gm_mobenix_v3_highquality_Water/info.txt" "disabled.txt" ren "gm_snabbansairfield_b1/info.txt" "disabled.txt" ren "gm_XhS_construct/info.txt" "disabled.txt" ren "linedraw/info.txt" "disabled.txt" ren "ModelManipulator/info.txt" "disabled.txt" ren "NeuroCars/info.txt" "disabled.txt" ren "Propeller Engine/info.txt" "disabled.txt" ren "VanDookie and Predaaator's pack/info.txt" "disabled.txt" ren "WAC ECM/info.txt" "disabled.txt" ren "WAC Extra Helicopters/info.txt" "disabled.txt" echo Done! pause

    Read the article

  • Different robots.txt for two different domains point to same folder

    - by Ali
    Hi, I have the following two domains: domain.com test.domain.com Both point to same folder which is "public_html". What I want is a different robots.txt file for each domain. So when someone browse domain.com/robots.txt then a different file is shown. And when someone go to test.domain.com/robots.txt then a different file is shown. How can I do this using URL rewriting in .htacces? Thanks

    Read the article

  • How to remove old robots.txt from google as old file block the whole site

    - by KnowledgeSeeker
    I have a website which still shows old robots.txt in the google webmaster tools. User-agent: * Disallow: / Which is blocking Googlebot. I have removed old file updated new robots.txt file with almost full access & uploaded it yesterday but it is still showing me the old version of robots.txt Latest updated copy contents are below User-agent: * Disallow: /flipbook/ Disallow: /SliderImage/ Disallow: /UserControls/ Disallow: /Scripts/ Disallow: /PDF/ Disallow: /dropdown/ I submitted request to remove this file using Google webmaster tools but my request was denied I would appreciate if someone can tell me how i can clear it from the google cache and make google read the latest version of robots.txt file.

    Read the article

  • robots.txt not updated

    - by Haridharan
    I have updated some url's and files in robots.txt file to block url's and files from google search results but, still files displaying in search results. As per a suggestion from a site I tried to update the robots.txt by below steps. In Google Webmaster tools, Health - Fetch as Google - type the url and click the fetch button. but, still files displaying in search results. Note: in Google Webmaster tools, Health - Blocked URL's - robots.txt file - downloaded date looks two dates back.

    Read the article

  • Recovering from an incorrectly deployed robots.txt?

    - by Doug T.
    We accidentally deployed a robots.txt from our development site that disallowed all crawling. This has caused traffic to dip dramatically, and google results to report: A description for this result is not available because of this site's robots.txt – learn more. We've since corrected the robots.txt about a 1.5 weeks ago, and you can see our robots.txt here. However, search results still report the same robots.txt message. The same appears to be true for Bing. We've taken the following action: Submitted site to be recrawled through google webmaster tools Submitted a site map to google (basically doing everything possible to say "Hey we're here! and we're crawlable!") Indeed a lot of crawl activity seems to be happening lately, but still no description is crawled. I noticed this question where the problem was specific to a 303 redirect back to a disallowed path. We are 301 redirecting to /blog, but crawling is allowed here. This redirect is due to a site redesign, wordpress paths for posts such as /2012/02/12/yadda yadda have been moved to /blog/2012/02/12. We 301 redirect to wordpress for /blog to keep our google juice. However, the sitemap we submitted might have /blog URLs. I'm not sure how much this matters. We clearly want to preserve google juice for URLs linked to us from before our redesign with the /2012/02/... URLs. So perhaps this has prevented some content from getting recrawled? How can we get all of our content, with links pointed to our site from pre-and-post redesign reporting descriptions? How can we resolve this problem and get our search traffic back to where it used to be?

    Read the article

  • URL blocked in robots.txt but still showing up on Google search [closed]

    - by Ahmad Alfy
    Possible Duplicate: Why do Google search results include pages disallowed in robots.txt? In my robots.txt I am disallowing a lot of URLs. Google webmaster tools says there're +750 URL blocked. The problem is the URLs are still showing on Google search. For example I have the following rule: Disallow: /entity/child-health/ But when I search some-keyword + child health the following URL shows up : http://www.sitename.com/entity/child-health/ Am I doing anything wrong? Is is possible for a URL to be blocked using robots.txt and still show up on search results?

    Read the article

  • Is my robots.txt working as it should?

    - by TigerBlood
    I want crawlers to have access to http://www.example.com but not http://www.example.com/ My robots.txt is as follows: User-agent: * Allow: /$ Disallow: / My site is in google search results, but I am not coming up in Bing, Yahoo, etc. I have had the same robots.txt since last year, and I initially requested inclusion ~1 year ago, having also resubmitted the URL to those latter search engines several times since as well. Is my robots.txt blocking those other crawlers? And if so, why not google as well? Thanks in advance!

    Read the article

  • Restricting crawler activity to certain directories with robots.txt

    - by neimad
    I would like to use robots.txt to prevent indexing of some parts of my website. I want search engines to index only the / directory and not search inside my controllers. In my robots.txt, I have this: User-Agent: * Disallow: /compagnies/ Disallow: /floors/ Disallow: /spaces/ Disallow: /buildings/ Disallow: /users/ Disallow: / I put this file in /mysite/public. I tested the file with a robots.txt validator and got no errors. However, Google always returns the result of my site. For testing, I added Disallow: /, but again, Google indexed all pages. floors, spaces, buildings, etc. are not physical directories. Is this a bug? How can I work around it?

    Read the article

  • Robots.txt Disallow command [on hold]

    - by Saahil Sinha
    How to disallow folders through Robots.txt, which are been crawled due to wrong url structure, which thus cause duplicate page error The URL been crawled as incorrectly by Google leading to duplicate page error: www.abc.com/forum/index.php?option=com_forum However, The actual correct pages however are: www.abc.com/index.php?option=com_forum Is this a correct way by excluding them through robots.txt: To exclude www.abc.com/forum/index.php?option=com_forum Below is command Disallow: /forum/ Will it not block in legitimate component folder 'Forum' of site?

    Read the article

  • Google indexing page with parameters but page is Disallowed in robots.txt

    - by Jakobud
    I have the following in robots.txt: User-agent: * Disallow: /refer.php User-agent: NinjaBot Allow: / Sitemap: http://www.mysite.com/sitemap.xml The refer.php file does various things depending on what GET parameters are passed to it. When I do a Google search, I see tons of results for pages like this: http://www.mysite.com/refer.php?o=23945 http://www.mysite.com/refer.php?o=39858 http://www.mysite.com/refer.php?o=9683 http://www.mysite.com/refer.php?o=10569 http://www.mysite.com/refer.php?o=58304 http://www.mysite.com/refer.php?o=69604 Is the reason that Google is indexing these because I don't have an asterisk * after refer.php in the robots.txt ? Should changing it to Disallow: /refer.php* fix the problem?

    Read the article

  • Robots.txt never downloaded but some blocked URLs in GWT

    - by Zistoloen
    There is something I don't understand in Google Webmaster Tools (GWT) for my Wordpress site. In menu "Blocked URLs", it mention that my robots.txt has never been downloaded but there are some blocked URLs. It's kind of weird and not logical. Am i missing something? User-agent : * Disallow: /*? Disallow: /wp-login.php Disallow: /wp-admin Disallow: /wp-includes Disallow: /wp-content Allow: /wp-content/uploads Disallow: */trackback Disallow: /*/feed Disallow: /*/comments Disallow: /cgi-bin Disallow: /*.php$ Disallow: /*.inc$ Disallow: /*.gz$ Disallow: /*.cgi$ Disallow: /author/* I'm afraid my robots.txt doesn't block several URLs I want to block.

    Read the article

  • HTTP 303 redirection and robots.txt

    - by Ian Dickinson
    On a site I'm working on, we're using the HTTP 303 redirect pattern (see this article for background) to distinguish between information and non-information resources. So: some URL's under /id get redirected to dynamically-created pages under /doc. These dynamic pages are built from a database, and contain links to other /doc/ resources, so in general we don't want them to be crawled. Our robots.txt contains: Disallow: /doc However, we do want the non-redirected pages under /id to get indexed by Google et al: Allow: /id So the question I have, which I can't find an answer to so far, is: if an allowed /id page 303-redirects to a /doc page, will it still be blocked by robots.txt? If yes, we're OK, but otherwise I'm going to disallow all /id resources in the robots file, as having the crawler hammer the db would be worse than losing search indexing for the /id pages.

    Read the article

  • How to secure robots.txt file?

    - by CompilingCyborg
    I would like for User-agents to index my relative pages only without accessing any directory on my server. As initial thought, i had this version in mind: User-agent: * Disallow: */* Sitemap: http://www.mydomain.com/sitemap.xml My Questions: Is it correct to block all directories like that - Disallow: */*? Would still search engines be able to see and index my sitemap if i disallowed all directories? What are the best practices for securing the robots.txt file? For Reference: Here is a good tutorial for robots.txt #Add this if you want to stop Alexa from indexing your site. User-agent: ia_archiver Disallow: / #Add this to stop duggmirror User-agent: duggmirror Disallow: / #Add this to allow specific agents User-agent: Googlebot Disallow: #Add this to allow all agents while blocking specific directories User-agent: * Disallow: /cgi-bin/ Disallow: /*?*

    Read the article

  • What is a good robots.txt for WP ?

    - by Steven
    What is the "best" setup for robots.txt? I'm using the following permalink structure in Wordpress: /%category%/%postname%/. My robots.txt currently looks like this (copied from somewhere a long time ago): User-agent: * Disallow: /cgi-bin Disallow: /wp-admin Disallow: /wp-includes Disallow: /wp-content/plugins Disallow: /wp-content/cache Disallow: /wp-content/themes Disallow: /trackback Disallow: /comments Disallow: /category/*/* Disallow: */trackback Disallow: */comments I want my comments to be indext. So I can remove this, right? Do I want to disallow indexing categories because of my permalinkstructure? An article can have several tags and be in multiple categories. This may cause duplicates in google. How should I work around this? Would you change anything else here?

    Read the article

  • My First robots.txt

    - by Whitechapel
    I'm creating my first robots.txt and wanted to get a second opinion on it. Basically I have a FTP setup on my board for some special users to transfer files between each other and I do NOT want that included in the search by the bots. I also want to point to my sitemap which gets auto generated by a PHP page. So here is what I have, what else should I include, and if I need to fix anything with it? Also, it's linking to xmlsitemap.php because that generates the sitemap when called. My goal is to allow any search bot crawl the forums to grab meta data. User-agent: * Disallow: /admin/ Disallow: /ali/ Disallow: /benny/ Disallow: /cgi-bin/ Disallow: /ders/ Disallow: /empire/ Disallow: /komodo_117/ Disallow: /xanxan/ Disallow: /zeroordie/ Disallow: /tmp/ Sitemap: http://www.vivalanation.com/forums/xmlsitemap.php Edit, I'm not sure how to handle all the user's folders under /public_html/ since the robots.txt will be going in /public_html.

    Read the article

1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >