crawl - Page 12 - Developer IT

SharePoint Server Search Not Crawling

- by tekiegreg

Hi there, we recently moved some sites into a new farm, everything seems to be doing fine, but the search for reasons I can't identify are not crawling the migrated content. We're getting this message in our crawl log for every document: http://xxx/sites/...announcements The object was not found. (The item was deleted because it was either not found or the crawler was denied access to it.) Of course the first thing I suspected was the crawler access account, so I logged into SharePoint with the account and was able to access via that URL just fine. I tried upping permissions (even all the way up to Admin) but to no avail. Thoughts?

Read the article

Do I really need Microsoft Updates?

- by Tony Wong

When I install a fresh copy of XP home (bought it from the store.. not a copy) my pc rocks like lightening speed. But when I start installing all the updates, patches & less .NET 4.0 client (As the .NET 4.0 Client seems to bring machine to slow crawl). The pc starts to slow down.. like there is more resources to watch or something in the background is happening. So could I not get away with a awesome virus protector & awesome firewall set-up and avoid all the patches? The machine I have is a quad 4, 4gb ram and 2.3 GHZ process. Tons of room and the machine can run several apps at one time.. but when the updates happen.. its s-l-o-w!

Read the article

Modify database for the SharePoint 2010 Enterprise Search administration web site

- by Mark Hall

Does anyone know how to modify the database settings for the Enterprise Search administration web site? When you configure the service application via Central Administration, SharePoint just decides to use the default database server and gives a name like Enterprise_Search_DB_Identifier. I want to modify this to atleast give a name that makes scense like SharePoint_Search_AdministrationWebContent, and it might be nice to move it to the database server that is hosting the crawl and property database. I figured out how to move the Central Administration web content database, but this database is not listed as a content database. It is listed as a Microsoft.Office.Server.Search.Administration.SearchAdminDatabase. I have not tested to see if the same process would work but because you are doing a RemoveContentDatabase and NewContentDatabase, I would assume not. Any help would be appreciated.

Read the article

Wget - if / else download condition?

- by Kai

I want wget to prefer a certain filetype over another, if the files have the same basename. For example: if foo.ogg available, don't download foo.mp3 the way i use wget so far to crawl/automatically download (if anyone is interested): wget -Dfoo.com -I /folder/ -r -l 1 -nc -A.ogg,.mp3 -i http://www.foo.com/folder/ but this, of course, gets me .mp3 AND .ogg files. It often also gets me image files like .png which i didn't want in the first place, and discards them afterwards. Any Ideas? (Syntax-Explanation: -D: download only from this Domain -I: download only from this subfolder of Domain -r: recursive (follow links and directory structure) -l 1: follow only 1 link deep -nc: no clobber = download only if file doesn't exist -A: accept/download only all *.ogg and *.mp3 (discard necessary html-files) -i: download-url/starting point)

Read the article

Download/update webpages listed in XML sitemap

- by unor

I'm searching a FLOSS tool that downloads all pages (and embedded resources, e.g. images) linked in a XML sitemap (built according to http://www.sitemaps.org/). The tool should "crawl" the sitemap regularly and look for new and deleted URLs and changes in the lastmod element. So whenever a page gets added/deleted/updated, the tool should apply the changes. Some sitemaps list sub-sitemaps in sitemapindex?sitemap. The tool should understand this and load all linked sub-sitemaps and look for URLs in there. I know there are tools that allow me to extract all URLs from the sitemap, so that I could feed them to wget or similar tools (see for example: Extract Links from a sitemap(xml)). But this wouldn't help in getting noticed about updates to pages. Tracking the webpages itself for updates doesn't work, because "secondary" content on the pages changes daily, but lastmod gets only updated when relevant content changed.

Read the article

Fix 403 errors in Google Webmaster Tools

- by Justin

Hi Team, I have a domain that has "fallen off a cliff" for searches in Google. Searches that used to be in position 1-4 are now gone from page 1. The same search in Bing shows the typical position expected (top 5 results). In reviewing Google Webmaster Tools, I am seeing two problems: 1. The Sitemap is reporting two errors: General HTTP error: HTTP 403 error (Forbidden) URLs not accessible However, the URL they provide as "no accessible" is accessible. I can click the link Google provides and it works fine. There are 6,000 crawl errors of type 403. Again, most of these pages that have 403 are accessible in my browser (tried various browsers as well). About half are from January, the other half from November. There are no IP-specific firewall rules on ports 80 and 443 that could block the goolgebot Using the user agent switcher add-on for FF I confirmed that the page loads when the user agent is the googlebot I an confirm that most of the pages reported as 403 are accessible. A search of just "site:thedomain.com" does confirm there are over 9,000 in the index. But most searches don't return the site. I believe the 403 issues are the cause of the fall in search rankings, but I can't seem to find any information online with ideas about how to address this. Any ideas? jpe

Read the article

Google doesn't seem to update the description or title of my homepage

- by Dayson

Before we launched our website, we had set up a "coming soon" page and google picked up the title and description from its contents. So the description in the search results said, "Coming soon! Visit epicwhale.org for updates." It's been a few weeks since we launched our website. We've even created a sitemap and submitted it to google. In the google webmaster panel, the pages have been crawled and all the pages are appearing as expected on google, EXCEPT the homepage which is still not updated! The title and description of the homepage in google search results still says coming soon.. The website I am referring to is textmewidget.com and below are the images of the search result. Google: http://i.imgur.com/vAkJg.png I checked on bing too, but it appears to be fine there. Bing: http://i.imgur.com/Q8O6L.png All other pages seem to be indexed fine on google. I don't even have any crawl errors in my reports. So what seems to be the problem? I've already waited for 2 weeks. Thanks in advance!

Read the article

Ranking drop after using reverse proxy for blog subdirectory and robots.txt for old blog subdomain

- by user40387

We have a 3Dcart store and a WordPress blog hosted on a separate server. Originally, we had a CNAME set up to point the blog to http://blog.example.com/. However, in our attempt to boost link-based and traffic-based authority on the main site, we've opted to do a reverse proxy to http://www.example.com/blog/. It’s been about two months since we finished the reverse proxy migration. It appears that everything is technically working as intended, including some robots and sitemap changes; the new URLs are even generating some traffic, as indicated on Google Analytics. While Google has been indexing the new URL locations, they’re ranking very poorly, even for non-competitive, long-tail keywords. Meanwhile, the old subdomain URLs are still ranking mostly as well as they used to (even though they aren’t showing meta titles and descriptions due to being blocked by robots.txt). Our working theory is that Google has an old index of the subdomain URLs, and is considering the new URLs to be duplicate content, since it’s being told not to crawl the subdomain and therefore can’t see the rel canonicals we have in place. To resolve this, we’ve updated the subdomain’s robot.txt to no longer block crawling and indexing. Theoretically, seeing the canonical tag on the subdomain pages will resolve any perceived duplicate content issues. In the meantime, we were wondering if anyone would have any other ideas. We are very concerned that we’ll be losing valuable traffic, as we’re entering our on season at the moment.

Read the article

Why was my site rejected for Google Adsense?

- by hyuun jjang

I have a 3 year old blog and its got around 16 articles/tutorials about some programming problems and solutions. It's getting pretty much a lot of view lately so I decided to apply for a google adsense account. When I first applied via blogger, google replied with the following statement: Page Type: In order to participate in Google AdSense, publishers' websites and application information must satisfy the following guidelines: - Your website must be your own top-level domain (www.example.com and not www.example.com/mysite). - You must provide accurate personal information with your application that matches the information on your domain registration. - Your website must contain substantial, original content... So, as I understood it, I decide to buy a domain and point my blogger blog to that new naked domain. and here is the newly bought domain where all the contents of my old blog resides. http://icodeya.com/ I reapplied, hoping that this time, I will make the cut. But then I got this reply Further detail: Unable to review your site: While reviewing http://www.icodeya.com/, we found that your site was down or unavailable. We suggest you check whether there was a typo in the URL submitted. When your site is operational, you can resubmit your application with the correct site by following the directions below. I'm a bit disappointed. Maybe I did something wrong with DNS configuration or something. But you can clearly see that my site is fully functional. I heard that google sends robots to crawl on to the site etc. It's just sad because I invested on a domain name, and now I can't even find ways to earn from it. Any tips?

Read the article

Can I use a 302 redirect to serve up static content from a url with escaped_fragment?

- by Starfs

We would like to serve up seo-friendly ajax-driven content. We are following this documentation. Has anyone ever tried to write a 302 redirect into the htaccess file, that takes the '?_escaped_fragment=' string and send that to a static page? For example /snapshot/yourfilename/ How will Google react to this? I've gone through the documentation and it's not very clear. The below quote is from Google's documentation this is what I find. I'm not sure if they are saying that you can redirect the _escaped_fragment_ url to a different static page, or if this is to redirect the hashtag URL to static content? Thoughts? From Google's site: Question: Can I use redirects to point the crawler at my static content? Redirects are okay to use, as long as they eventually get you to a page that's equivalent to what the user would see on the #! version of the page. This may be more convenient for some webmasters than serving up the content directly. If you choose this approach, please keep the following in mind: Compared to serving the content directly, using redirects will result in extra traffic because the crawler has to follow redirects to get the content. This will result in a somewhat higher number of fetches/second in crawl activity. Note that if you use a permanent (301) redirect, the url shown in our search results will typically be the target of the redirect, whereas if a temporary (302) redirect is used, we'll typically show the #! url in search results. Depending on how your site is set up, showing #! may produce a better user experience, because the user will be taken straight into the AJAX experience from the Google search results page. Clicking on a static page will take them to the static content, and they may experience avoidable extra page load time if the site later wants to switch them to the AJAX experience.

Read the article

Recovering from an incorrectly deployed robots.txt?

- by Doug T.

We accidentally deployed a robots.txt from our development site that disallowed all crawling. This has caused traffic to dip dramatically, and google results to report: A description for this result is not available because of this site's robots.txt – learn more. We've since corrected the robots.txt about a 1.5 weeks ago, and you can see our robots.txt here. However, search results still report the same robots.txt message. The same appears to be true for Bing. We've taken the following action: Submitted site to be recrawled through google webmaster tools Submitted a site map to google (basically doing everything possible to say "Hey we're here! and we're crawlable!") Indeed a lot of crawl activity seems to be happening lately, but still no description is crawled. I noticed this question where the problem was specific to a 303 redirect back to a disallowed path. We are 301 redirecting to /blog, but crawling is allowed here. This redirect is due to a site redesign, wordpress paths for posts such as /2012/02/12/yadda yadda have been moved to /blog/2012/02/12. We 301 redirect to wordpress for /blog to keep our google juice. However, the sitemap we submitted might have /blog URLs. I'm not sure how much this matters. We clearly want to preserve google juice for URLs linked to us from before our redesign with the /2012/02/... URLs. So perhaps this has prevented some content from getting recrawled? How can we get all of our content, with links pointed to our site from pre-and-post redesign reporting descriptions? How can we resolve this problem and get our search traffic back to where it used to be?

Read the article

Update Google Sitemap for Mobile

- by dimo414

I have a series of utilities to generate Google sitemaps for my whole site. These files are massive, and slow to build. We want to start telling Google these pages are mobile-crawl-able too, by adding them to mobile sitemaps, but the documentation is unclear if I need to specify physically different files for my mobile URLs than for my normal ones. If this is my current sitemap: <?xml version="1.0" encoding="UTF-8" ?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://mobile.example.com/article100.html</loc> </url> </urlset> Can I simply change it to: <?xml version="1.0" encoding="UTF-8" ?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:mobile="http://www.google.com/schemas/sitemap-mobile/1.0"> <url> <loc>http://mobile.example.com/article100.html</loc> <mobile:mobile/> </url> </urlset> Or do I need to create new files with the additional markup, alongside my existing files?

Read the article

Cropping images & SEO

- by user1181950

So I have a page with a bunch of images with largely varying sizes. Also the layout of the page is such that the images are all in the shape of square tiles, so just resizing will cause distorted images. What I've been doing previously is when users upload images, I resize and crop them appropriately and display the new image as the thumbnail and load full image when user clicks on it. However, I just realized this is an issue with SEO as google will crawl the thumbnails and stick the thumbnails on Google Images instead of the full images. Is there any way to show a cropped/resized image but have Google Image show the full image? I can do something with css using an enclosing div and overflow:hidden, but I'd imagine the performance on that would be pretty bad. Any suggestions? Thanks! PS. I saw this (Make google index the actual image not the thumbnail), but in my case I have users continuously uploading images, and the database of images is always changing and pretty big (thousands), so sitemap will be pretty unwieldy..

Read the article

Hard Drive Fundamentals And Verifying Disk Performance

- by Agnel Kurian

Over the past few months, my Windows XP machine has slowed down to a crawl. It takes about 10-15 minutes to go from power-up to reaching a responsive state. I have reasons to believe that this is a result of the hard disk slowing down. Questions: Do hard disks slow down as a result of mechanical wear and tear ...or age? How do I check if my disk has slowed down? Conversely, how can I verify that my disk is indeed running at the speed it's designed to run at? Could drivers be at fault here? Do hard disks come with drivers or does Windows use a generic driver?

Read the article

On Mac OS X how can I monitor what is using my internet connection?

- by Jon Hopkins

I've got a relatively limited broadband connection (I live miles from the nearest exchange) and from time to time net access (but nothing else) slows to a near crawl. I know from a bit of monitoring software that the connection is being fairly heavily used which would explain it but I don't know what's using it. There are certainly plenty of things which might (these days there are dozens of apps that will either regularly or infrequently check data or download updates) but how can I find out? I'm happy to pay (a small amount of) money if needed, though in that case I'd rather it were a recommendation that me just Googling for something.

Read the article

Do I really need Microsoft Updates?

- by Tony Wong

When I install a fresh copy of Windows XP Home (I bought it from the store.. not a copy), my PC rocks like lightening speed. But when I start installing all the updates, patches & less .NET 4.0 client (as the .NET 4.0 Client seems to bring machine to slow crawl). The PC starts to slow down.. like there are more resources to watch or something is happening in the background. So could I not get away with an awesome virus protector and an awesome firewall set-up and avoid all the patches? The machine I have is a quad 4, 4 GB RAM and 2.3 GHz process. Tons of room and the machine can run several applications at one time.. but when the updates happen.. it's s-l-o-w!

Read the article

New site not appearing in index after change of address, no feedback from google webmaster tools

- by Duffy

Our change of address seems to not be taking effect. Here's the story so far: We're a web company and our product is called The New Hive. Our site used to be at thenewhive.com, but we decided to switch to newhive.com (drop the "the", it's cleaner). So the timeline of what I've tried, starting on July 29th: used 301 redirects for all pages (e.g. thenewhive.com/tag/art = newhive.com/tag/art) At this point we noticed that we had disappeared from search results when searching "The New Hive", the front page used to be all links to our site plus a couple news articles about the company. So on August 5th I: verified new domain in webmaster tools (old domain was already verified) submitted a change of address request on August 5th with Webmaster Tools / Configuration / Change of Address Then after another week, on August 13th I did this: Went to Webmaster Tools / Health / Fetch as google fetched our homepage and a couple sub pages, all successfully clicked "Submit to Index" for homepage As of today (August 23rd) we're still not showing up in the index. We're getting no warnings or feedback of any kind from the dashboard so I'm inclined to think something's broken with the dashboard rather than that something's wrong with our site from an SEO perspective. From the dashboard: No new messages or recent critical issues. Crawl Errors: No data available. From Health - Index Status: Total indexed 0 Ever crawled 42,490 Not selected 12 Blocked by robots 0 I'm really at a loss here, any help would be appreciated.

Read the article

How to schedule time-of-day upgrades

- by Richard

Hello, I'm responsible for about 30 Ubuntu computers at a private K-8 school. We have only a 3Mbps internet connection serving the entire campus, and I would like to ensure that updates are done in the middle of the night - so that daytime tasks are not slowed down. I'm using Ubuntu 10.04, and have set all computers to download and install security updates via the update manager. I have also installed cron-apt, and modified the config file to stagger the start times of the upgrades from about 10pm to 4am local time. HOWEVER - this morning I arrived at the school at 7:30am and all the computers were busy downloading a large security based update. Needless to say, all internet activity was slowed to a crawl (for the next 2 hours), and the computer users were very very upset. This was the event I'm trying so hard to prevent. It seems that my scheme to ensure middle of the night downloads failed, and I'm not sure why. I've also tried some schemes using unattended-upgrades & crontab, but there always seemed to be something scheduling upgrades to occur in addition to the ones I try to force at middle of the night. Is there a sure fire way to absolutely positively guarantee that updates will occur only at one specific time? It would be nice if the update manager just had a drop down menu to specify a designated time. Thanks in advance for any help you can give me.

Read the article

Virtual Server 2005 R2 kungfu

- by AngryHacker

Does Virtual Server 2005 R2 have a command line interface, that's versatile enough? Here is a situation. I run a Win2k VM on an old memory constrained machine. I allocate it 378MB of RAM and the VM runs just fine. Once a month, inside the VM, I backup the (a very large) database, compress it using 7Zip and ftp it to the backup site (all in a script). Unfortunately the compression part takes a massive amount of RAM (far exceeding the 378MB), it goes for the paging file and brings absolutely everything to a crawl and literally takes 2-3 days, if left unattended. So to fix this, I have to shutdown the VM, give it temporarily 768MB of RAM and then the whole thing finishes in 20 minutes. So, is there a way do the following automatically from the host machine in a script? Shutdown the guest OS (I think, I got this part) Change the RAM allocation from 378 to 768 Start the guest OS again then, 1 hour later, do everything in reverse.

Read the article

SEO with duplicate content

- by user16831

I have a nature photography site with multiple types of photo galleries. Each photo and associated caption on my site appears in several galleries. For instance, a photo of a goldfinch that was taken on a trip to New Mexico in 2008 will appear in the "goldfinch.php" gallery, in the "finches.php" gallery, and in the "New_Mexico_2008.php" gallery. This duplication is useful for my site visitors - User A may want to see goldfinch photos, whereas User B wants to see photos from New Mexico - but I am concerned about the SEO implications. The typical suggestions to deal with duplicate content, such as 301 redirects and canonical tags, probably won't work in this case, because the page content is substantially different (ranging from ~1% to ~90% duplication, depending on the specific example chosen). The obvious solution to me would be to edit robots.txt to only allow search engines to crawl one type of gallery - for instance, if they crawled only the galleries organized by species(e.g. goldfinch.php), all the photos on my site would be found exactly once. However, the Google content guidelines recommend against blocking crawler access to duplicate information. Should I go ahead and use robots.txt anyway? Or is there a better solution?

Read the article

Multicasting Windows 7 Image

- by LawnChairSkank

I am trying to deploy some new machines with windows 7 for the first time in our computer labs. We used to use third party imaging software and then run sysprep after the image was copied(XP), but it seems you can't go that route with windows 7. We set up a new imaging server with the windows system image manager, but when we try to multicast the image it pretty much takes down our whole staff and faculty network. I heard you can turn on a multicast feature on our cisco switches to help with the issue, but that it also slows the switches to a crawl. Another idea we have tried was pulling the the computer lab switch off the main network and plugging the imaging server directly into the computer lab switch so the multicast doesn't take down our network, but it doesn't seem to work without being able to hit a domain controller. Is there a way to multicast without taking out the network? I feel like I am missing something... Thanks in advance

Read the article

Updating Google sitemap for mobile

- by dimo414

I have a series of utilities to generate Google sitemaps for my whole site. These files are massive, and slow to build. We want to start telling Google these pages are mobile-crawl-able too, by adding them to mobile sitemaps, but the documentation is unclear if I need to specify physically different files for my mobile URLs than for my normal ones. If this is my current sitemap: <?xml version="1.0" encoding="UTF-8" ?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://mobile.example.com/article100.html</loc> </url> </urlset> Can I simply change it to: <?xml version="1.0" encoding="UTF-8" ?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:mobile="http://www.google.com/schemas/sitemap-mobile/1.0"> <url> <loc>http://mobile.example.com/article100.html</loc> <mobile:mobile/> </url> </urlset> Or do I need to create new files with the additional markup, alongside my existing files?

Read the article

How cloudfront works?

- by Dharmik Bhandari

I'm planning to Implement CDN(Content Delivery Network) of Amazon which is known as CloudFront in ASP.NET MVC3 with c#. I've googled about it but little bit confuse about few things mentions below. Is it compulsory that we have to uploads all static resources to CDN Network first and then we can use or Is it manageable by Amazon to crawl site static resources which is predefine folder or directory of sites? Is Amazon automatic update its copies when we anything change in static resources or every time we have to upload updated resources to CDN network.

Read the article

How to correctly handle redirect after site facelift

- by Stefan

I recently updated our site taking it from a multi-page site to a single page site. The problem now is that when the site is searched in say Google, it displays the site as well as the indexed pages. So if a user clicks say our "About" page, it takes them to our now outdated material. I am hoping to get some guidance on how to properly handle this. I figure the first step is to now setup a robots.txt on our new index page to tell the engines not to crawl beyond index.php. But in the meantime, how do I handle the fact that when searching our site on Google we may still have users who try to click on sub-page links? Should I simply setup redirects while waiting for the engines to update? And if so, do I need to setup redirects on each page using PHP or is this something I would take care of on our sites control panel? I am not very familiar with redirects... Any help is appreciated!

Read the article

Can I use a 302 redirect to serve up static content from an URL with escaped_fragment?

- by Starfs

We would like to serve up SEO-friendly Ajax-driven content. We are following this documentation. Has anyone ever tried to write a 302 redirect into the .htaccess file, that takes the ?_escaped_fragment= string and send that to a static page?, for example /snapshot/yourfilename/. How will Google react to this? I've gone through the documentation and it's not very clear. The below quote is from Google's documentation this is what I find. I'm not sure if they are saying that you can redirect the _escaped_fragment_ URL to a different static page, or if this is to redirect the hashtag URL to static content? Thoughts? From Google's site: Question: Can I use redirects to point the crawler at my static content? Redirects are okay to use, as long as they eventually get you to a page that's equivalent to what the user would see on the #! version of the page. This may be more convenient for some webmasters than serving up the content directly. If you choose this approach, please keep the following in mind: Compared to serving the content directly, using redirects will result in extra traffic because the crawler has to follow redirects to get the content. This will result in a somewhat higher number of fetches/second in crawl activity. Note that if you use a permanent (301) redirect, the url shown in our search results will typically be the target of the redirect, whereas if a temporary (302) redirect is used, we'll typically show the #! url in search results. Depending on how your site is set up, showing #! may produce a better user experience, because the user will be taken straight into the AJAX experience from the Google search results page. Clicking on a static page will take them to the static content, and they may experience avoidable extra page load time if the site later wants to switch them to the AJAX experience.

Search Results

Search found 446 results on 18 pages for 'crawl'.

Page 12/18 | < Previous Page | 8 9 10 11 12 13 14 15 16 17 18 | Next Page >

- by tekiegreg

- by Tony Wong

- by Mark Hall

- by Kai

- by unor

- by Justin

- by Dayson

- by user40387

- by hyuun jjang

- by Starfs

- by Doug T.

- by dimo414

- by user1181950

- by Agnel Kurian

- by Jon Hopkins

- by Tony Wong

- by Duffy

- by Richard

- by AngryHacker

- by user16831

- by LawnChairSkank

- by dimo414

- by Dharmik Bhandari

- by Stefan

- by Starfs

< Previous Page | 8 9 10 11 12 13 14 15 16 17 18 | Next Page >