Search Results

Search found 446 results on 18 pages for 'crawl'.

Page 2/18 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • Redirects in .htaccess to avoid crawl errors

    - by user71698
    I am getting a lot of errors in Webmaster Tools and basically there's a lot of links ending like this: mydomainname.com/links.php How can I redirect these links, to shave off this part at the end? For instance, there is a link in Google: http://www.onlineglobalbiz.com/article-marketing/www.onlineglobalbiz.com/links.php This should be: http://www.onlineglobalbiz.com/article-marketing/ Using .htaccess, how can I redirect from the incorrect links?

    Read the article

  • Will Google crawl session based website

    - by DonShwep
    I have a website, it is split into 3 categories but using PHP its an all-in one kind of style. When a user chooses a category on the home page a session is set, this is then used to set the style and contents of the website. Would Googlebot and other bots be able to still scan my website? If a page is accessed and no session is set then the user is sent back to the home page. I have created special links, that set a session but go straight to the contact page. Even this page doesn't seem to be showing up. Any ideas if a sitemap with specially crafted links (to set the session) will help Google?

    Read the article

  • webmaster tools 500 crawl error for asp faceted navigation that does not exist

    - by user19007
    i am getting 2,500 type 500 url errors in google webmaster tools. These pages are faceted navigation results that can not be reached by a site visitor. These pages do not exist. We are using faceted navigation with the Volusion platform (asp.net, I think). I have specified url parameters in webmaster tools so that google will not try to index anything faceted. This does not stop the errors from generating. I am concerned about how this might effect seo (bleeding page rank). I can provide additional information if needed. I am not sure how to solve this. I have started down the path of creating 301's, but having some difficulty there as well.

    Read the article

  • sharepoint crawl not indexing main site

    - by user22215
    Guys I'm having some strange search issues' going on with my main portal application. First off let me give you a little back ground on the problem web app. Our Sharepoint environment was originally set up by a consultant that did not follow best practices. She used one web app to house our companies' intranet site, ssp, and mysites. Since than I have provisioned a new ssp that I have segmented correctly I moved all of our other sites over to the new ssp with out any problems . However, I could not assign the main portal app to the new ssp since the portal app housed the ssp site collection. So I deleted the ssp site collection after that I deleted the ssp and assigned the portal app to my new ssp. Now this is where the problem starts when I attempt to crawl this application the crawl starts than stops 5 seconds later with a status of success also it reports that 1 item was successfully crawled. The funny thing is the main portal app has nearly 30000 items. I have tracked the problem down to the web app if I create a test web app than restore the content I have no problem crawling all 30000 items. Also all of my other web apps that use the same ssp have no problem completing crawls. I don't see anything in the ULS logs or server 2003's event viewer. Also I'm using a separate dedicated index server that's configured to crawl itself via host file configuration. I would like to fix this problem with out having to recreate our main portal site due to the fact that we have several custom code modifications where DLL's were registered to the IIS bin folder also I don't even want to get into the Silverlight mods that were done. Any help with this problem is much appreciated Same problem as minehttp://www.experts-exchange.com/OS/Microsoft_Operating_Systems/Server/MS-SharePoint/Q_23885820.html

    Read the article

  • need to crawl images and the whole web pages

    - by Kei Situ
    hey, I am starting a project and wonder the relationship between the characters in images and the whole web page where the images reside. so first, i want to crawl some images and their web pages.....need to save the crawl result in local disk for further analysis. I wonder if there is any open source for this issue? thx^_^

    Read the article

  • Sharepoint Search crawl not working

    - by Satish
    Search Crawling is error out on my MOSS 2007 installation. I get the following error for all the web apps I have following error in Crawl logs. http://mysites.devserver URL could not be resolved. The host may be unavailable, or the proxy settings are not configured correctly on the index server. The Application Event log also has the following corresponding error The start address http://mysites.devserver cannot be crawled. Context: Application 'SSPMain', Catalog 'Portal_Content' Details: The URL of the item could not be resolved. The repository might be unavailable, or the crawler proxy settings are not configured. To configure the crawler proxy settings, use the Proxy and Timeout page in search administration. (0x80041221) I'm using Windows 2008 server. I tried accessing the site using the above mentioned url and its available. I did the registry setting for loop back issue found here http://support.microsoft.com/kb/896861 still not luck. Any Ideas?

    Read the article

  • Sharepoint Search crawl not working

    - by Satish
    Search Crawling is error out on my MOSS 2007 installation. I get the following error for all the web apps I have following error in Crawl logs. http://mysites.devserver URL could not be resolved. The host may be unavailable, or the proxy settings are not configured correctly on the index server. The Application Event log also has the following corresponding error The start address http://mysites.devserver cannot be crawled. Context: Application 'SSPMain', Catalog 'Portal_Content' Details: The URL of the item could not be resolved. The repository might be unavailable, or the crawler proxy settings are not configured. To configure the crawler proxy settings, use the Proxy and Timeout page in search administration. (0x80041221) I'm using Windows 2008 server. I tried accessing the site using the above mentioned url and its available. I did the registry setting for loop back issue found here http://support.microsoft.com/kb/896861 still not luck. Any Ideas?

    Read the article

  • SharePoint 2010 search crawl not working

    - by J. Hammond
    I have the following set up: 1 x Windows server 2008 R2 running SQL 1 x Windows server 2008 R2 running SharePoint 2010. I have an issue with the search service application, the crawl appears to run for a never ending amount of time with 0 successes and 0 failures. Checking the Search Application Topology, I find that "Query Component 0" is "Not Responding". I have tried the following: I have ensured that the index directory has the right permissions applied to it and the search service account is in the right groups to consume those permissions. I have re-created the search service application. I have restarted the search service manually. I have trawled the net as much as possible to find a solution but as of yet have not come across something that has resolved this issue. Any input will be very much appreciated

    Read the article

  • Google Webmasters Tools strange 404 errors referred from same site

    - by Out of Control
    Starting about a month ago, I noticed a sudden increase in 404 errors in Webmasters Tools for one of my sites (over 1400 errors so far). All the errors are being referred from my own site to non existent pages. The 404 error URLs are all of the same format: URL: http://www.helloneighbour.com/save/1347208508000 The number on the end appears to be a timestamp followed by 3 zeros. The referring page, in this case is : Linked from http://www.helloneighbour.com/save/cmw-insurance-insurance-burnaby When I look at the source code of that page, or I use Webmaster tools to view the page as Google sees it, I can't find any link that comes close to what is above. I built the site, and I can't find any place that might be causing these false links either. The server logs (access and error) don't show Google or anyone else trying to access these links. I've marked all these pages as fixed, and waited a couple of weeks, only to find the errors come back again over the last few days. I'm wondering if anyone else has seen anything strange like this, or if someone might have a way for me to debug, replicate this error myself.

    Read the article

  • In Google webmaster tools, can a "soft 404" be triggered by the text on the page?

    - by Stephen Ostermiller
    I just ran across an error in Google Webmaster Tools that I have never seen before. I manage the website for my local community band (I play trombone). One of the pages on the site is a list of our upcoming performances. It is powered by a WordPress events plugin that uses a database of upcoming events that are entered through the administration interface. We just finished up our summer and fall concerts and our next performance will be our Christmas concert. I hadn't gotten around to adding that into the website yet, so there are no upcoming events shown on the page. In fact the text on the page says: No upcoming events listed under Performance. Check out past events for this category or view the full calendar. Then in Google Webmaster Tools, this page is showing up as a "soft 404": The page is returning a 200 status and Google is indicating that he 404 is "soft". I wouldn't have expected Googlebot to be as sophisticated to parse that particular sentence. Is Googlebot able to detect that the text on the page indicates that there is currently not content and then treat it as a 404 page because of that? If Google is treating this page as a soft 404 because of the text on the page, does that mean that like regular 404 pages, the page won't show up in search results?

    Read the article

  • Submitting a sitemap to take care of inherited Google crawler errors

    - by leeand00
    I have an awful lot of Google Crawler errors (1000 or so) after I inherited a site that the previous owner migrated without moving much of their content. Would generating a map of the current site and submitting it to Google help fix this? Is there any quicker, automated way to eliminate errors other than clicking each and every site error? Note: I have already tried automating this on my own.

    Read the article

  • Too many access denied errors showing in Google Webmaster Tools every day

    - by user2255733
    I get 18,000 access denied error showing in Google Webmaster Tools every day! So strange it shows for URL's with www and not no-www. Fetch as Google works perfectly for pages got that error. Google starts to downgrade my website - impressions have dropped from 35,000 to 18,000. I am using cloud flair CDN and .htaccess mod_rewrite. Any help will be extremely appreciated as I am really loosing control.

    Read the article

  • Massive 404 attack with non existent URLs. How to prevent this?

    - by tattvamasi
    The problem is a whole load of 404 errors, as reported by Google Webmaster Tools, with pages and queries that have never been there. One of them is viewtopic.php, and I've also noticed a scary number of attempts to check if the site is a WordPress site (wp_admin) and for the cPanel login. I block TRACE already, and the server is equipped with some defense against scanning/hacking. However, this doesn't seem to stop. The referrer is, according to Google Webmaster, totally.me. I have looked for a solution to stop this, because it isn't certainly good for the poor real actual users, let alone the SEO concerns. I am using the Perishable Press mini black list (found here), a standard referrer blocker (for porn, herbal, casino sites), and even some software to protect the site (XSS blocking, SQL injection, etc). The server is using other measures as well, so one would assume that the site is safe (hopefully), but it isn't ending. Does anybody else have the same problem, or am I the only one seeing this? Is it what I think, i.e., some sort of attack? Is there a way to fix it, or better, prevent this useless resource waste? EDIT I've never used the question to thank for the answers, and hope this can be done. Thank you all for your insightful replies, which helped me to find my way out of this. I have followed everyone's suggestions and implemented the following: a honeypot a script that listens to suspect urls in the 404 page and sends me an email with user agent/ip, while returning a standard 404 header a script that rewards legitimate users, in the same 404 custom page, in case they end up clicking on one of those urls. In less than 24 hours I have been able to isolate some suspect IPs, all listed in Spamhaus. All the IPs logged so far belong to spam VPS hosting companies. Thank you all again, I would have accepted all answers if I could.

    Read the article

  • Weird URLs being access by Googlebot

    - by Avishai
    Lately I've been seing all sorts of strange URLs show up as errors in my Webmaster tools account, but they're URLs that don't actually exist on my site, nor are linked from the pages that Google claims they're linked from. URL Response Code Detected yR3kna/5RfA4+ndtn/X4zcevudMlXbqbIrnPbH9irw= 404 9/16/12 OK4iaOVdr6Ocjmz+u1kuR5Q486mhDo/e45nwjl2+y8= 404 9/9/12 pxGz/oHEA0BS8U3VFBzJcZnnIHMsFXb3/rIxMxh2ws= 404 9/16/12 Af8tbvQ0HniIpf53I8Txz1hM1/JxxrFQxgqPuErWII= 404 9/9/12 7Bk7c0LDmm4PHyTjml017EGwNNPCn/p/0xMSWWPDic= 404 9/16/12 umCwnDvTE8ybpUB19MIb+VRj5xRJncyYGGfAQ2Mxn0= 404 9/1/12 # etc... Do you know how to make these stop? It's not at all clear to me why it would be going to these URLs in the first place.

    Read the article

  • ifilter not working with MOSS 2007, cant crawl .pdf

    - by SORRYPROFESSEROFYEARNING
    Installed ifilter and followed the guides: http://msmvps.com/blogs/sundar_narasiman/archive/2008/02/06/configuring-moss-2007-to-search-pdf-documents-install-and-configure-pdf-ifilters.aspx and the accompanying link to the MS hotfix.. I have initiated multiple crawls that don't show any .pdf documents, let alone the contents of the .pdfs (I did constantly upload test documents with real content). In the 'file types' menu of the shared servies, it didn't show the pdf icon as I think it was meant to, it also lists 'pdf' as filetype 'AcroExch.Document', is this correct? Any ideas anyone?

    Read the article

  • Simultaneous read/write to RAID array slows server to a crawl

    - by Jeff Leyser
    Fairly beefy NFS/SMB server (32GB RAM, 2 Xeon quad cores) with LSI MegaRAID 8888ELP controlling 12 drives configured into 3 different arrays. 5 2TB drives are grouped into a RAID 6 array. As expected, write performance to the array is slow. However, sustained, simultaneous read/write to the array (wether through NFS or done locally) seems to practically block any other access to anything else on the controller. For example, if I do: cp /home/joe/BigFile /home/joe/BigFileCopy where BigFile is 20G, then even a simple ls /home/jane will take many 10s of seconds to complete. In addition, an ls /backup will also take many tens of seconds, even though /backup is a different array on the same controller. As soon as the cp is done, everything is back to normal. cp /home/joe/BigFile /backup/BigFile does not exhibit this behavior. It's only when doing read/write to the same array.

    Read the article

  • TortoiseSVN client slows Explorer to a crawl in Windows XP running in Parallels

    - by Cory Larson
    I thought I'd make my first SuperUser question relatively simple, though it's the kind of question that may not get many responses as I'm not directly involved with the issue. A colleague does his development in Windows XP running in Parallels on his Mac. We've just migrated our VSS repository to SVN, and we've gone with TortoiseSVN as our client of choice with the Ankhsvn plugin for Visual Studio. On his XP instance, after installing TortoiseSVN, browsing through folders using Explorer is extremely slow; about 15 - 30 seconds before the contents of the next folder displays. It's the slowest when opening My Computer. Once he reaches a folder that contains the working content of an SVN project, Explorer behaves quickly again as expected. It seems that TortoiseSVN may be spending a bunch of time searching subfolders for stuff so it can do its icon-overlay thing, but that's just a guess. I've used TortoiseSVN for years on both XP and Vista on far less powerful machines without any issues with Explorer, so I'm attributing the slowness to it being run in a VM, though that may not be the actual issue. So has anyone encountered similar performance issues, and/or know of a fix? Keep in mind that any requests to make changes to his configuration will need to be communicated and thus my response time might be slow. Thanks everyone!

    Read the article

  • HTTP transfer speeds start fast then slows to a crawl

    - by AnITAdmin
    We just got a new dedicated 1 gigabit server running IIS. The CPU is 15% or less, the RAM (4 GB total) has 3 GB unused... We are pushing 110 mbits per second... Speeds are really slow.. And, if fact, here's how it happens: We connect, and then the speeds are really fast, and quickly decline to 40 kBps or less. What's going on? It seems the server just wont go above 120 mbits per second. The files are all very large. 50 MB to 500 MB... Could this be a factor? Again, CPU, RAM, UI responsiveness when accessing remotely all seem fine.

    Read the article

  • How to crawl retweets of a certain user?

    - by Xiong
    I studied the Twitter API Documentation today. Only find that we could use "Twitter REST API Method: statuses user_timeline" to acquire statuses of a certain user. Retweets are stripped out of the user_timeline for backwards compatibility reasons. If I want retweets included, API Documentation recommend "statuses retweeted_by_me", but retweeted_by_me cannot return the retweets by other users. I think maybe we can analyse the twitter webpage of a certain user to get his retweets. However is there any elegant way to crawl retweets of a certain user? Thanks in advance!

    Read the article

  • Legality, terms of service for performing a web crawl

    - by Berlin Brown
    I was going to crawl a site for some research I was collecting. But, apparently the terms of service is quite clear on the topic. Is it illegal to now "follow" the terms of service. And what can the site normally do? Here is an example clause in the TOS. Also, what about sites that don't provide this particular clause. Restrictions: "use any robot, spider, site search application, or other automated device, process or means to access, retrieve, scrape, or index the site" It is just research? Edit: "OK, from the standpoint of designing an efficient crawler. Should I provide some form of natural language engine to read terms of service and then abide by them."

    Read the article

  • How Search Engine Bots Crawl Forums?

    - by Waleed Eissa
    If I have a forums site with a large number of threads, will the search engine bot crawl the whole site every time? Say I have over 1,000,000 threads in my site, will they get crawled every time the bot crawls my site? or how does it work? I want my website to be indexed but I don't want the bot to kill my website! In other words I don't want the bot to keep crawling the old threads again and again every time it crawls my website. Also, what about the pages crawled before? Will the bot request them every time it crawls my website to make sure they are still on the site? I'm asking this because I only link to the latest threads, i.e. there's a page that contains a list of all the latest threads, but I don't link to the older threads, they have to be explicitly requested by URL, e.g. http://www.mysite.com/showthread.aspx?threadid=7 , will this work to stop the bot from bringing my site down and consuming all my bandwidth? P.S. The site is still under development but I want to know in order to design the site so that search engine bots don't bring it down. Thanks

    Read the article

  • How to find all occurences of an email address on a website

    - by thomasrutter
    Let's say I have a large website which may have a number of email addresses on it that are getting picked up by spammers. I plan to obfuscate or remove them all. What's the easiest way to crawl my website to find any email addresses I may be exposing? Either through on-page text (which Google can pick up, but not very well) or mailto: links (which Google can't).

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >