Search Results

Search found 247 results on 10 pages for 'bots'.

Page 6/10 | < Previous Page | 2 3 4 5 6 7 8 9 10  | Next Page >

  • Firewall - Preventing Content Theft & Rogue Crawlers

    - by drodecker
    Our websites are being crawled by content thieves on a regular basis. We obviously want to let through the nice bots and legitimate user activity, but block questionable activity. We have tried IP blocking at our firewall, but this becomes to manage the block lists. Also, we have used IIS-handlers, however that complicates our web applications. Is anyone familiar with network appliances, firewalls or application services (say for IIS) that can reduce or eliminate the content scrapers?

    Read the article

  • PHP form auto response

    - by Mark
    Hi, I am using the following php code which has been given to me, it works fine, apart from the auto response bit. I know its not a lot of code I just dont know how to do it or why it snot working. Any help would be appreciated. thanks in advance. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <title> - Contact Us</title> <!-- css --> <link rel="stylesheet" type="text/css" href="css/reset.css" /> <link rel="stylesheet" type="text/css" href="css/styles.css" /> <link rel="stylesheet" type="text/css" href="css/colorbox.css" /> <!-- javascript libraries --> <?php require_once('includes/js.php'); ?> </head> <body> <?php //FIll out the settings below before using this script $your_email = "(email address)"; $website = "(website name)"; //BOTS TO BLOCK $bots = "/(Indy|Blaiz|Java|libwww-perl|Python|OutfoxBot|User-Agent|PycURL|AlphaServer|T8Abot|Syntryx|WinHttp|WebBandit|nicebot)/i"; //Check if known bot is visiting if (preg_match($bots, $_SERVER["HTTP_USER_AGENT"])) { exit ("Sorry bots are not allowed here!"); } //Known Exploits $exploits = "/(content-type|bcc:|cc:|from:|reply-to:|javascript|onclick|onload)/i"; //Spam words $spam_words = "/(viagra|poker|blackjack|porn|sex)/i"; // BAD WORDS $words = "/( bitch|dick|pussy|pussies|ass|fuck|cum|cumshot|cum shot| gangbang|gang bang|god dammit|goddammit|viagra|anus|analsex )/i"; //BAD WORD/SPAM WORD/EXPLOIT BLOCKER function wordBlock($word) { //Make variables global global $words; global $exploits; global $spam_words; if (preg_match($words, $word)) { $word = preg_replace($words, "#####", $word); } if(preg_match($exploits,$word)){ $word = preg_replace($exploits,"",$word); } if(preg_match($spam_words,$word)){ $word = preg_replace($spam_words,"$$$$",$word); } return $word; } //CLean data function function dataClean($data) { $data = stripslashes(trim(rawurldecode(strip_tags($data)))); return $data; } //CREATE MAIN VARIABLES $name = (isset ($_POST['name'])) ? dataClean($_POST['name']) : FALSE; $company = (isset ($_POST['company'])) ? dataClean($_POST['company']) : FALSE; $address = (isset ($_POST['address'])) ? dataClean($_POST['address']) : FALSE; $postcode = (isset ($_POST['postcode'])) ? dataClean($_POST['postcode']) : FALSE; $phone = (isset ($_POST['phone'])) ? dataClean($_POST['phone']) : FALSE; $email = (isset ($_POST['email'])) ? dataClean($_POST['email']) : FALSE; $comment = (isset ($_POST['message'])) ? wordBlock(dataClean($_POST['message'])) : FALSE; $submit = (isset ($_POST['send'])) ? TRUE : FALSE; $email_check = "/^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,6}$/i"; //$ip = $_SERVER["REMOTE_ADDR"]; $errors = array(); //Check if send button was clicked if ($submit) { if (!$name) { $errors[] = "Please enter a name!"; } if ($name) { if (!ereg("^[A-Za-z' -]*$", $name)) { $errors[] = "You may not use special characters in the name field!"; } } if (!$email) { $errors[] = "Please enter an email address!"; } if ($email) { if (!preg_match($email_check, $email)) { $errors[] = "The E-mail you entered is invalid!"; } } /* if (!$subject) { $errors[] = "Please enter an email subject!"; } */ if (!$comment) { $errors[] = "Please don't leave the message field blank!"; } //Check if any errors are present if (count($errors) > 0) { foreach ($errors AS $error) { print "&bull; $error <br />"; } } else { //MESSAGE TO SEND TO ADMIN //Create main headers $headers = "From: " . $website . " <$your_email> \n"; $headers .= "Reply-to:" . $email . " \n"; $headers .= "MIME-Version: 1.0\n"; $headers .= "Content-Transfer-Encoding: 8bit\n"; $headers .= "Content-Type: text/html; charset=UTF-8\n"; $message = ""; $message .= "<h1>New E-Mail From " . $website . "</h1><br /><br />"; $message .= "<b>Name:</b> " . $name . "<br />"; $message .= "<b>Company:</b> " . $company . "<br />"; $message .= "<b>Address:</b> " . $address . "<br />"; $message .= "<b>Postcode:</b > " . $postcode . "<br />"; $message .= "<b>Phone No:</b> " . $phone . "<br />"; $message .= "<b>E-mail:</b> " . $email . "<br />"; $message .= "<b>Message:</b> " . $comment . "<br />"; //E-mails subject $mail_subject = "Message from " . $website . ""; /* CHECK TO BE SURE FIRST E-MAIL TO ADMIN IS A SUCCESS AND SEND EMAIL TO ADMIN OTHERWISE DON'T SEND AUTO RESPONCE */ if (mail($your_email, $mail_subject, $message, $headers)) { //UNSET ALL VARIABLES unset ($name, $email, $company, $address, $postcode, $phone, $comment, $_REQUEST); //JAVASCRIPT SUCCESS MESSAGE echo " <script type='text/javascript' language='JavaScript'> alert('Your message has been sent'); </script> "; //SUCCESS MESSAGE TO SHOW IF JAVASCRIPT IS DISABLED echo "<noscript><p>THANK YOU YOUR MESSAGE HAS BEEN SENT</p></noscript>"; /* -----------------END MAIL BLOCK FOR SENDING TO ADMIN AND START AUTO RESPONCE SEND----------------- */ //AUTO RESPONCE MESSAGE //Create main headers $headers = "From: " . $website . " <$your_email> \n"; $headers .= "Reply-to:" . $your_email . " \n"; $headers .= "MIME-Version: 1.0\n"; $headers .= "Content-Transfer-Encoding: 8bit\n"; $headers .= "Content-Type: text/html; charset=UTF-8\n"; $message = ""; $message .= "<h1>Thank You For Contacting Us </h1><br /><br />"; $message .= "On behalf of <b>" . $website . "</b> we wanna thank you for contacting us and to let you know we will respond to your message as soon as possible thank you again."; //E-mails subject $mail_subject = "Thank you for contacting " . $website . ""; //Send the email mail($email, $mail_subject, $message, $headers); /* -----------------END MAIL BLOCK FOR SENDING AUTO RESPONCE ----------------- */ } else { echo " <script type='text/javascript' language='JavaScript'> alert('Sorry could not send your message'); </script> "; echo "<noscript><p style='color:red;'>SORRY COULD NOT SEND YOUR MESSAGE</p></noscript>"; } } } ?> <div id="wrapper"> <div id="grad_overlay"> <!-- Header --> <div id="header"> <a href="index.php" title="Regal Balustrades"><img src="images/regal_logo.png" alt="Regal Balustrades" /></a> <div id="strapline"> <img src="images/strapline.png" alt="Architectural metalwork systems" /> </div> </div> <!-- Navigation --> <div id="nav"> <?php require_once('includes/nav.php'); ?> </div> <!-- Content --> <div id="content"> <div id="details"> <p class="getintouch env">Get In Touch</p> <ul class="details"> <li>T. (0117) 935 3888</li> <li>F. (0117) 967 7333</li> <li>E. <a href="mailto:[email protected]" title="Contact via email">[email protected]</a></li> </ul> <p class="whereto hse">Where To Find Us</p> <ul class="details"> <li>Regal Balustrades</li> <li>Regal House, </li> <li>Honey Hill Road,</li> <li>Kingswood, </li> <li>Bristol BS15 4HG</li> </ul> </div> <div id="contact"> <h1>Contact us</h1> <p>Please use this form to request further information about Regal Balustrades and our services. To speak to a member of our staff in person, please call us on 0117 9353888</p> <div id="form"> <form method='POST' action='<?php echo "".$_SERVER['PHP_SELF'].""; ?>'> <p class='form-element'> <label for='name'>Name:</label> <input type='text' name='name' value='<?php echo "" . $_REQUEST['name'] . "";?>' /> </p> <p class='form-element'> <label for='company'>Company:</label> <input type='text' name='company' value='<?php echo "" . $_REQUEST['company'] . "";?>' /> </p> <p class='form-element'> <label for='address'>Address:</label> <textarea name='address' rows='5' id='address' class='address' ><?php echo "" . $_REQUEST['address'] . "";?></textarea> </p> <p class='form-element'> <label for='postcode'>Postcode:</label> <input type='text' name='postcode' value='<?php echo "" . $_REQUEST['postcode'] . "";?>' /> </p> <p class='form-element'> <label for='phone'>Telephone:</label> <input type='text' name='phone' value='<?php echo "" . $_REQUEST['phone'] . "";?>' /> </p> <p class='form-element'> <label for='email'>Email:</label> <input type='text' name='email' value='<?php echo "" . $_REQUEST['email'] . "";?>' /> </p> </div> <div id='form-right'> <p class='form-element'> <label for='message'>Enquiry:</label> <textarea name='message' class='enquiry' id='enquiry' rows='5' cols='40' ><?php echo "" . $_REQUEST['message'] . "";?></textarea> </p> <p class='form-element'> <input type='submit' class='submit' name='send' value='Send message' /> </p> </div> <p class='nb'><em>We will respond as soon as possible.</em></p> </form> </div> </div> </div> </div> </div> <!-- Footer --> <div id="footer-container"> <?php require_once('includes/footer.php'); ?> </div> <!-- js functions --> <script> $(document).ready(function() { $("ul#navig li:nth-child(6)").addClass("navon"); }); </script> </body> </html>

    Read the article

  • Does Email Address Obfuscation Actually Prevent Spam?

    - by Jason Fitzpatrick
    Many people obfuscate their email addresses–typing out someguy (at) somedomain (dot) com, for example–to project themselves from SPAM bots. Do such obfuscation techniques actually work? Today’s Question & Answer session comes to us courtesy of SuperUser—a subdivision of Stack Exchange, a community-drive grouping of Q&A web sites. HTG Explains: Does Your Android Phone Need an Antivirus? How To Use USB Drives With the Nexus 7 and Other Android Devices Why Does 64-Bit Windows Need a Separate “Program Files (x86)” Folder?

    Read the article

  • User generated articles, how to do meta description?

    - by Tom Gullen
    If users submit a lot of good quality articles on the site, what is the best way to approach the meta description tag? I see two options: Have a description box and rely on them to fill it sensibly and in a good quality way Just exclude the meta description Method 1 is bad initially, but I'm willing to put time in going through and editing/checking all of them on a permanent basis. Method 2 is employed by the stack exchange site, and lets the search bots extract the best part of the page in the SERP. Thoughts? Ideas? I'm thinking a badly formed description tag is more damaging than not having one at all at the end of the day. I don't expect content to ever become unwieldy and too much to manage.

    Read the article

  • How frequently Googlebot fetch sitemaps? Is it depending on PageRank?

    - by JITHIN JOSE
    How much frequently Google fetches sitemaps? I am now working with a high traffic website normally have 30 new posts per minute but currently it provides sitemaps which links to new 100 posts (3 minutes). Is this method is enough? Is bots fetch sitemaps every 3 minutes? Did need to change sitemaps to list all 5M posts(indexed sitemaps)? How this change will effect on traffic and PageRank? Is Googlebot remove URLs that previously listed on sitemaps but not now?

    Read the article

  • exclamation mark for sitemaps in webmastertools when resubmited

    - by Jayapal Chandran
    Hi, I have three sitemaps submitted to webmastertools. In that one has very few links and was accepted. It showed a green tick. The other two had around 150 links. They had been accepted in think yet webmastertools displays the exclamation mark. I think i saw this already but what confused was my hosting was blocking frequent bots recently by using a firewall and just now they added googles ip range in their witelist. and then my robots and sitemaps were read by webmastertools. But two sitemaps shows exclamation. I hope it is nothing to do with the above problem. What are all the reasons and where can i see the reason for that exclamation mark.? Here is the screen shot.

    Read the article

  • Would using AJAX only "Add to Cart" buttons be wise?

    - by Alex Erwin
    I want to AJAX enable all of my Add To Cart buttons because search engine bots are indexing these and not paying attention to my robots file or site map. I just don't want to loose potential customers. I have seen a number of top sites using heavily JavaScript support content, including Amazon, is it OK to follow the trend? The rest of my site progressively degrades, but I would really like to implement this because of the benefits to the customer (instant satisfaction), my infrastructure (constant page rebuilds), and allowing me to use SEO tools to optimize without the tool picking up thousands of "Add to Cart" widgets in my catalog. Thanks

    Read the article

  • How can i move towards the Business intelliegnce/ data mining fields from software developer

    - by user1758043
    I am working as python developer and i work with djnago. I also do some web scrapping and building spiders and bots. Now from there i want to make my move to Business intelligence. I just want to know how can i move into that field. because as companies are not going to hire me in that field directly , i just want to know how can i make transistions. I was thinking of first work as Database developer in sql and then i can see futher. But i want from you guys so that i can start learning that stuff so that i can chnage jobs keeping that in mind. here in my area there are plent of jobs in all area but i need to know hoe to transitio and what thing i should learn before making that transition. Here JObs are plenty so if i know my stuff , getting job is piece of cake becaus ethey don't ahve any persons. same jobs keep getting advertised for months and months

    Read the article

  • How can I move towards the Business Intelligence/ data mining fields from software developer [closed]

    - by user1758043
    I am working as a Python developer and I work with django. I also do some web scraping and building spiders and bots. Now from there I want to make my move to Business Intelligence. I just want to know how I can move into that field. Because as companies are not going to hire me in that field directly, I just want to know how can I make the transistion. I was thinking of first working as Database developer in SQL and then I can see further. But I want advice from you guys so that I can start learning that stuff so that I can change jobs keeping that in mind. Here in my area there are plenty of jobs in all areas but I need to know how to transition and what things I should learn before making that transition. Here jobs are plenty so if I know my stuff, getting a job is a piece of cake because they don't have any people. Same jobs keep getting advertised for months and months.

    Read the article

  • Does a "nofollow" attribute on a link prevent URL discovery by search engines?

    - by Stephen Ostermiller
    I know that nofollow prevents link juice from being passed across a link. But if search engine robots discover a link with a nofollow on it, will they add that link to their crawl queue? In other words, if I create a link to a brand new page and put a rel=nofollow attribute on that link, will it prevent search engine bots (particularly Googlebot) from crawling the page. (Assuming that this link remains the only link into that page.) I've read conflicting reports about this over the years and I'm looking for authoritative references about the current state of affairs. Official statements from Google or published results of independent testing would be ideal.

    Read the article

  • Google is not indexing my entire site despite having a sitemap

    - by Anusha
    I have an e-commerce website www.beyondtime.in. I have been constantly monitoring Googlebot crawling on my website and my webmaster account. Lately, I have found two issues that I have not been able to understand. 1.) The Google Bots have been only crawling www.beyondtime.in/telecom.php when the URL is not even valid. What needs to be done to let Google crawl other pages of the website as well? 2.) The second question is about the Google Webmaster account, where I've submitted my sitemap with 227 URLs. Out of that, only 156 have been indexed. None of the images of my website have been indexed by Google.

    Read the article

  • webmaster tools - Network Unreachable

    - by Jayapal Chandran
    Hi, webmaster tools for my site displays that robots.txt unreachable and for all links in sitemap it says network unreachable. sitemap.xml unreachable. These appear in crawl stats page. I discussed with the support team of my hosting and they said... Hi, I have verified apache logs, i cannot see any issues on your website/webserver/ Possible issues. There may the routing issue from the googles server to our server. When a google bots hits goes high the IP will be automatically blacklisted by our firewall to avoid server loads & downtimes. As we donot have access to their services, We cannot able to give details of their details/logs etc. The sitemaps link shows an exclamation mark which means the file was not reachable. What could be the problem and how to solve it?

    Read the article

  • Yandex frequently replaces page names with ampersands

    - by Guy
    The Yandex spider is a frequent visitor to one of the sites I manage. On ocassion it replaces the page name with two ampersands and a space. So if the page is: /mypage.aspx?param=value then it will try and crawl it as: /&& ?param=value Any idea why it is doing this? [EDIT] If I remember correctly the IP that this "mistake" is coming from is based in California and not Russia. I believe that they crawl US sites from a US based IP address. Not sure if that helps. More Info about request: IP: 199.21.99.82 City: Palo Alto State: California Country: United States ISP: Yandex Inc. User-Agent: Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)

    Read the article

  • Will Google crawl session based website

    - by DonShwep
    I have a website, it is split into 3 categories but using PHP its an all-in one kind of style. When a user chooses a category on the home page a session is set, this is then used to set the style and contents of the website. Would Googlebot and other bots be able to still scan my website? If a page is accessed and no session is set then the user is sent back to the home page. I have created special links, that set a session but go straight to the contact page. Even this page doesn't seem to be showing up. Any ideas if a sitemap with specially crafted links (to set the session) will help Google?

    Read the article

  • How to crawl a webPage with dynamic content added by javascript

    - by blunderboy
    I guess there is a news that Google bots have the capability to understand our javascript code. It means this is possible to fully crawl a webpage which has lazy loading feature enabled. I am using Apache Nutch to crawl websites but I don't think it has the capability to fetch the URLs being injected in HTML page by javascript when the page is scrolled down. I see a lot of websites doing lazy loading for performance issue. So Can somebody please explain me how can i crawl the data which comes in HTML page on lazy load. (On scrolling the page down).

    Read the article

  • My First robots.txt

    - by Whitechapel
    I'm creating my first robots.txt and wanted to get a second opinion on it. Basically I have a FTP setup on my board for some special users to transfer files between each other and I do NOT want that included in the search by the bots. I also want to point to my sitemap which gets auto generated by a PHP page. So here is what I have, what else should I include, and if I need to fix anything with it? Also, it's linking to xmlsitemap.php because that generates the sitemap when called. My goal is to allow any search bot crawl the forums to grab meta data. User-agent: * Disallow: /admin/ Disallow: /ali/ Disallow: /benny/ Disallow: /cgi-bin/ Disallow: /ders/ Disallow: /empire/ Disallow: /komodo_117/ Disallow: /xanxan/ Disallow: /zeroordie/ Disallow: /tmp/ Sitemap: http://www.vivalanation.com/forums/xmlsitemap.php Edit, I'm not sure how to handle all the user's folders under /public_html/ since the robots.txt will be going in /public_html.

    Read the article

  • How frequently Googlebot fetch sitemaps? Is it depending on page rank?

    - by JITHIN JOSE
    How much frequently google fetches sitemaps? I am now working with a high traffic website normally have 30 new posts per minute.But currently it provides sitemaps which links to new 100 posts(3 minutes). Is this method is enough ?. Is Bots fetch sitemaps every 3 minutes?. Did need to change sitemaps to list all 5M posts(indexed sitemaps)?. How this change will effect on traffic and page rank. Is google bot remove urls that previously listed on sitemap but not now?

    Read the article

  • Evidence for automatic browsing - Log file analysis

    - by Nilani Algiriyage
    I'm analyzing web server logs both in Apache and IIS log formats. I want to find the evidence for automatic browsing, like web robots, spiders, bots, etc. I used python robot-detection 0.2.8 for detecting robots in my log files, but I know there may be other robots (automatic programs) which have traversed through the web site but robot-detection can not identify. So I want to ask: Are there any specific clues that can be found in log files that human users do not leave but automated software would? Do they follow a specific navigation pattern? I saw some requests for favicon.ico - does this implicate that it is a automatic browsing?. I found this article and this question with some valuable points.

    Read the article

  • Robots meta tag with "noimageindex"

    - by jimy
    I have some doubt regarding noimageindex value in meta robots tag. If I add this tag on my page say http://www.example.com/somepage/someaction.php and on that page the images are served from another page say http://www.exampleimg.com. Then will the tag has meaning. I mean to say will the images will be ignored by bots? Or exampleimg is not affected by that tag. And all images will be indexed? Note: We want to stop indexing of the images on that particular page.

    Read the article

  • Unimportant words on page being seen as keywords, due to repetiveness

    - by user21100
    I have a list of products on my site, and each product has a number of descriptors such as features, price, etc. Next to each product, I list the 10 features, with a graphical icon which lets the user know whether the product has that particular feature or not. In all, I have about 230 products, and I have to add the same list of features to describe each product, so you can see the enormous redundancy here of these "feature names". These "feature names", ex., "water proof", are not important keywords at all, yet due to the sheer volume of these words, Google is seeing them as my most important keywords. Is there any way to get around this, or to tell the bots to place (less) emphasis on these repetitive words, and not view them as important keywords?

    Read the article

  • How can I create a dynamic site that is still search-bot friendly?

    - by zuko
    If I want to have a slide effect between pages. You click a link, it is loaded off to the side and then slides in (pushing the old page off the other side). I can imagine using jQuery to do the PHP and the effects... but how do I do something like this that gracefully degrades for users without Javascript, including bots? Possibly more problematic: what if I wanted to have a sort of mural background across the site, perhaps with a parallax scrolling effect, and sliding to other pages reveals more of the, possibly giant image? Again, I can imagine how to do this with lots of fancy jQuery and PHP but it would heavily rely on those. How can I gracefully degrade in a situation like that? Any pointers, articles or books would be greatly appreciated. I keep trying to search for answers but I just get a lot of "theory"-based, unhelpful blogs.

    Read the article

  • Googlebot can't access my site when crawling from rootdomain

    - by PéCé
    I can't explain why I get this message for my rootdomain result in Google : trocmalin.com/ A description for this result is not available because of this site's robots.txt – learn more. Here is my site specifics : vide-greniers.trocmalin.com is the site address www.trocmalin.com redirects (301) to vide-greniers.trocmalin.com trocmalin.com redirects (301) to vide-greniers.trocmalin.com too... User-agent: * Disallow: /orga/ User-agent: * Disallow: /sitemap-update Google results for vide-greniers.trocmalin.com are well rendered, as well as sub pages allowed for bots. But the result for my rootdomain (trocmalin.com) gives this message... Can you help me ?

    Read the article

  • is it ok to have 2 sitemaps on 1 website?

    - by user615041
    Do I have to have a sitemap page on my index page for bots to read it or can I just have it anywhere on my server? I have a phpbb/wordpress integration and I need 2 sitemaps mods for each one (or I need to have them somehow integrated together into one xml sitemap). Is this possible? Whats my best option? I would have the phpbb one something like this: http://www.example.com/phpbb/sitemap.html and the wordpress one something like this: http://www.example.com/wordpress/sitemap.html and then I would submit both off..but not have the links on my footer to confuse anyone.., the sitemaps would strictly be for search engines. Is this a good idea? what are you thoughts?

    Read the article

  • What are some potential issues in blocking all incoming requests from the Amazon cloud?

    - by ElHaix
    Recently I, along with the rest of the world, have seen a significant increase in what appears to be scraping from Amazon AWS-related sources. So simply put, I blocked all incoming requests from the Amazon cloud for our hosted application. I know that some good services/bots are now hosted on the cloud, and I'm wondering if certain IP addresses should be allowed, as they may gather data that would in the end benefit our site's SEO rankings? -- UPDATE -- I added a feature to block requests from the following hosts: Amazon Softlayer ServerDeals GigAvenue Since then, I have seen my network traffic decrease (monitored by network out bytes). Average operation is around 10,000,000 bytes. You can see where last week I was not blocking, then started blocking. I've since removed the blocks and will see what the outcome is.

    Read the article

< Previous Page | 2 3 4 5 6 7 8 9 10  | Next Page >