Search Results

Search found 287 results on 12 pages for 'crawling pasta hellion'.

Page 7/12 | < Previous Page | 3 4 5 6 7 8 9 10 11 12 | Next Page >

Google webmaster Verification failed.

- by KMC

I have a site created by Ruby on Rails. I had verified against Google Webmaster Tool some months ago, which was successful. One day webmaster starts giving me Re-verification fails. I tried again to verify my site using Meta tags and HTML files. But I kept having "Verification failed. The connection to your server timed out." Since then, Google stop crawling my site's content - though, somehow google still crawl my PDF contents on my site.

Read the article
Google indexed page a day before also reflecting in search but today everything vanish

- by ganesh

We had robots.txt which disallow all robots as we were in development. We are live now. We change robots.txt as per our requirement a day before. Submit indexes using Google Webmaster Tools index status. After this we can see proper result in search as well as Google images search was working as expected. Suddenly today all these things vanish from Google Search. Now again I can see old result i.e. under construction message. I checked robots.txt in Google Webmaster Tools, it's ok - no crawling errors. Kindly let me know what exactly happened? How I can inform this issue to Google?

Read the article
Does Submit to Index on a page with new content update Content Keywords for the site?

- by Dan Kanze

Using Google Webmaster Tools I'm trying to update the Content Keywords of my site. I'm confused about the relationship between Submit to Index and Content Keywords Does Fetch as Google -- Submit to Index on a previously existing indexed page containing new content expidite updating the Content Keywords crawled by the real Google bot? Does Submit to Index only submit new URL's so that previously indexed URL's still point to the older cached version until Google crawls specifically for new content on its own? Does Submit to Index have anything to do with Content Keywords or crawling new content being a previously indexed page or never been indexed page?

Read the article
SSL Certificate

- by outdoorcat

I've received the email below from google about my wordpress site and have no idea how to follow the instructions. Any help out there? Dear Webmaster, The host name of your site, https://www.example.com/, does not match any of the "Subject Names" in your SSL certificate, which were: *.wordpress.com wordpress.com This will cause many web browsers to block users from accessing your site, or to display a security warning message when your site is accessed. To correct this problem, please get a new SSL certificate from a Certificate Authority (CA) with a "Subject Name" or "Subject Alternative DNS Names" that matches your host name. Thanks, The Google Web-Crawling Team

Read the article
How to allow Google Images search to by pass hotlink protection?

- by Marco Demaio

I saw Google Images seems to index my images only if hotlink protection is off. * I use anyway hotlink protection because I don't like the idea of people sucking my bandwidth, i simply this code to protcet my sites from being hotlinked: RewriteEngine on RewriteCond %{HTTP_REFERER} !^$ RewriteCond %{HTTP_REFERER} !^http(s)?://(www\.)?mydomain\.com/.*$ [NC] RewriteCond %{HTTP_REFERER} !^http(s)?://(www\.)?mydomain\.com$ [NC] RewriteRule .*\.(jpg|jpeg|png|gif)$ - [F,NC,L] But in order to allow Google Image search to bypass my hotlink protection (I want Google Images search to show my images) would it suffice to add a line like this one: RewriteCond %{HTTP_REFERER} !^http(s)?://(www\.)?google\.com/.*$ [NC] RewriteCond %{HTTP_REFERER} !^http(s)?://(www\.)?google\.com$ [NC] Because I'm wondring: is the crawler crawling just from google.com? and what about google.it / google.co.uk, etc.? FYI: on Google official guidelines I did not find info about this. I suppose hotlink protection prevents Google Images to show images in its results because I did some tests and it seems hotlink protection does prevent my images to be shown in Google Images search.

Read the article
Doubt regarding search engine/plugin(One present on the website itself)

- by Ravi Gupta

I am new to web development and trying to study various types of websites as case study. Right now my focus is on how search engines works for an eCommerce website. I know basic functioning for a search engine, i.e. crawl web pages, index them and the display the results using those indexes. But I got little confuse in case of an eCommerce website. Don't you think that it would be better if a search engine instead of crawling the web pages containing products, it should directly crawl the database and index the products stored in the database? And when a user search for any product, it will simply give us the rows of the table which matches the user query? If this is not the case, can someone please explain how the usual method works on eCommerce website?

Read the article
Should I prevent search engines indexing tag/category pages?

- by Macha

On my site, I currently have no special rules for search engines. It is a blog, statically generated using a Python program. When I search for some of my articles on Google, there is usually a tag or category page included in the results. Sometimes it even ranks ahead of the article itself. Obviously, as these links aren't always going to have the article on them, this aren't the results I want people to click on. So, I'm thinking of setting noindex on these pages. Is there any possible downside to doing so? Is this possible to do via robots.txt, or do I have to add it to all the relevant templates? All I can find for robots.txt are ways to stop the search engine crawling those pages, which isn't what I want - while I don't want them indexed, it's still the only surefire way to find all my blog posts.

Read the article
How to identify the client is a search robot?

- by Yau Leung

I have built my entire site using AJAX (indeed it's GWT). I have also implemented AJAX crawling proposed by Google. However, after the implementation, I found that neither Yahoo , Bing, nor Baidu implemented that scheme! I'm wondering if there is a way to identify the web client is a search robot. If they are, they will be shown the HTML snapshot I created. It will be best if I can identify them in APACHE level, then I can just do a mod_rewrite. But it's still ok if I can do that in PHP or GWT.

Read the article
How could I manage Google Adsense to approve my Web App? It keeps denying it

- by Javierfdr

Google adsense keeps denying my app from having ads, because of an "insufficient content" issue. I manage a Web Application that allows the users to set Youtube Videos as Alarm Clocks. It includes an in-site Youtube search to retrieve videos from user queries and lists the users alarms. The site has a good traffic (500 users per day), is currently promoted by Google in Google Chrome Webstore, and the ajax requests are crawlable, following Google's guidelines (https://developers.google.com/webmasters/ajax-crawling/). Although I understand there is not much content, beyond the user-generated, I really don't what else should I include in the site. Perhaps adding contact and about pages, and maybe another section would increase the navigation. Google argues I need a "fully launched and functioning site, allowing users to navigate throughout your site with a menu, sitemap, or appropiate links". They also ask for "full sentences or paragraphs" Isn't a Google Adsense solutions for Web Applications? Would all the web-apps have to include useless navigable subpages?

Read the article
Getting a lot of '/_' errors from webmaster tools

- by Vermino

I'm using a WordPress site and I thought I got all the kinks out of it. For some reason Webmaster Tools is crawling my website and showing a lot of 404 errors which are from /_ like additional pages that I've never created. I just can't figure out what is creating these for Google crawlers and then displaying a 404. My robots.txt is here. My sitemap (created by the Yoast plugin) is here. I have Yoast and Jetpack plugins installed. What could be causing these links to appear

Read the article
Duplicate page content and the Google index

- by Kit Sunde

I have a static pages with dynamically expanding content that google is indexing. I also have deep links into virtually duplicate pages which will pre-expand the relevant section of content into the relevant section. It seems like Google is ignoring all my specialized pages and not putting them in the index. Even after going through web-masters tools, crawling and submitting them to the index manually. I also use the google API for integrating search on the site, and the deep linked pages won't show up. Is there a good solution for this?

Read the article
Where would you start if you were trying to solve this PDF classification problem?

- by burtonic

We are crawling and downloading lots of companies' PDFs and trying to pick out the ones that are Annual Reports. Such reports can be downloaded from most companies' investor-relations pages. The PDFs are scanned and the database is populated with, among other things, the: Title Contents (full text) Page count Word count Orientation First line Using this data we are checking for the obvious phrases such as: Annual report Financial statement Quarterly report Interim report Then recording the frequency of these phrases and others. So far we have around 350,000 PDFs to scan and a training set of 4,000 documents that have been manually classified as either a report or not. We are experimenting with a number of different approaches including Bayesian classifiers and weighting the different factors available. We are building the classifier in Ruby. My question is: if you were thinking about this problem, where would you start?

Read the article
How to identify a PDF classification problem?

- by burtonic

We are crawling and downloading lots of companies' PDFs and trying to pick out the ones that are Annual Reports. Such reports can be downloaded from most companies' investor-relations pages. The PDFs are scanned and the database is populated with, among other things, the: Title Contents (full text) Page count Word count Orientation First line Using this data we are checking for the obvious phrases such as: Annual report Financial statement Quarterly report Interim report Then recording the frequency of these phrases and others. So far we have around 350,000 PDFs to scan and a training set of 4,000 documents that have been manually classified as either a report or not. We are experimenting with a number of different approaches including Bayesian classifiers and weighting the different factors available. We are building the classifier in Ruby. My question is: if you were thinking about this problem, where would you start?

Read the article
Why google isn't updating my site title in search results? [closed]

- by SharkTheDark

Possible Duplicate: Google doesn't seem to update the description or title of my homepage I had my domain for few days before I uploaded site to it, and it had one title, and then when I uploaded content it should get new title, but with my misunderstanding of WordPress it had blocked robots.txt and keyword with no-index and no-follow. But I removed that like 7 days ago, and I see in reports that Google bot is crawling over my site, but my site title isn't updating, it still has old domain title when site wasn't there... My robots.txt has now: User-agent: * Allow: / I have clear title tag on every page. How long does it take to update? Do I need to check something else?

Read the article
Another website is mirroring and ranks above my site in search results

- by Marlboro Goodluck

There is a site of ill-repute known as thedirty which has completely mirrored my site and now has links appearing on Google at the #1 spot using my content. I checked my log files and noticed that this site has been crawling mine for sometime, and also has 10,000 links from their site to mine. I have blocked user access which is referred from this site and reported them as web spam to Google already. I also disavowed the domain. How are they getting top links in Google (even overtaking mine) for such nefarious tactics? What are the steps to completely eliminating an issue such as this?

Read the article
Another website is mirroring my site

- by Marlboro Goodluck

Question for you all. There is a site of ill repute known as thedirty which has completely mirrored my site and now has links appearing on Google at the #1 spot using my content. I checked my log file and noticed that this site has been crawling mine from sometime, and also has 10k links from their site to mine. I have blocked user access which is referred from this site and reported them as web spam to Google already. I also disavowed the domain. How are they getting top links in Google (even overtaking mine) for such nefarious tactics? What are the steps to completely eliminating an issue such as this?

Read the article
Google indexing pages with #! although we don't have any

- by Benjamin Gruenbaum

Our company has developed a Single Page Application using AngularJS and its routing. Google indexed our site decently with JavaScript but it did not index some pages very well so we have developed an HTML only version. We have followed the Ajax Crawling Specification posted here and have a <meta name='fragment' content='!'> tag and canonical urls. We expect http://www.example.com/foo/bar to be fetched from http://www.example.com/?_escaped_fragment_=/foo/bar. However, we have found out that when we rolled the AJAX specification we now have all pages indexed twice, once with the JavaScript version as http://www.example.com/foo/bar and once with the new version as http://www.example.com/#!/foo/bar. This is harmful to us since it's duplicate content and also mis-representing out site. I have tried looking for similar questions here and in the Google product forum but could not come up with anything.

Read the article
Xpath Injection detection Tool

- by preeti

Hi, I am working on xpath Injection attack, so looking forward to build a tool to detect xpath Injection Tool in a website.Is web crawling and scanning be used for this? What can be the Logic to detect it? Are there any open source tools to detect it, so that i can develop it in Java by looking at logic used in that code. Thank You.

Read the article
WebCrawling Dynamic Links

- by Jojo

Hi Everyone, Anybody has any idea on crawling websites that have dynamic pages/queries? I mean if I click a certain link, it has different values every I try to reload it in a web browser. Now my webcrawler could not download the contents of these pages. Please advise.

Read the article
How to write a crawler?

- by Jason

Hi All, I have had thoughts of trying to write a simple crawler that might crawl and produce a list of its findings for our NPO's websites and content. Does anybody have any thoughts on how to do this? Where do you point the crawler to get started? How does it send back its findings and still keep crawling? How does it know what it finds, etc,etc. Thanks! -Jason

Read the article
How to retrieve Directories size including all sub-directories?

- by vikingosegundo

I have stored images from the net like this Documents/imagecache/domain1/original/path/inURI/foo.png Documents/imagecache/domain2/original/path/inURI/bar.png Documents/imagecache/... Documents/imagecache/... Now I'd like to check the size of imagecache including all it sub-directories. Is there a convenient way of doing it — preferable without crawling through all the data manually?

Read the article
Is there a way to crawl all facebook fan pages?

- by user220755

Is there a way to crawl all facebook fan pages and collect some information? like for example crawling facebook fan pages and save their names, or how many fans, etc? Or at least, do you have a hint of how this could be possibly done?

Read the article
What is the best way to freshen a Nutch index?

- by Miles

I haven't looked at Nutch for a year or so and it looks like it has changed significantly. The documentation on re-crawling isn't clear. What is the best way to update an existing Nutch index?

Read the article
Google search box

- by user343282

I am working on a google box, something like this, http://mytwentyfive.com/blog/wp-content/uploads/byme/Google%20Search%20Appliances.jpg I am pointing the crawler to a folder where there are html files. before the crawler was crawling the files and indexing them but right now it finds the pattern or the folder but not following any html files within the folder. I have tried everything I could and know but, can't think of anything else. Can someone help? thanks

Read the article
how to prevent all crawlers except good ones (google, bing, yahoo) access website content?

- by tranhuyhung

I just want to let Google, Bing, Yahoo crawl my website to build indexes. But I do not want my opposite website use crawling service to steal my website content. What should I do?

Read the article

< Previous Page | 3 4 5 6 7 8 9 10 11 12 | Next Page >