Search Results

Search found 241 results on 10 pages for 'crawling'.

Page 6/10 | < Previous Page | 2 3 4 5 6 7 8 9 10  | Next Page >

  • SSL Certificate

    - by outdoorcat
    I've received the email below from google about my wordpress site and have no idea how to follow the instructions. Any help out there? Dear Webmaster, The host name of your site, https://www.example.com/, does not match any of the "Subject Names" in your SSL certificate, which were: *.wordpress.com wordpress.com This will cause many web browsers to block users from accessing your site, or to display a security warning message when your site is accessed. To correct this problem, please get a new SSL certificate from a Certificate Authority (CA) with a "Subject Name" or "Subject Alternative DNS Names" that matches your host name. Thanks, The Google Web-Crawling Team

    Read the article

  • How to identify the client is a search robot?

    - by Yau Leung
    I have built my entire site using AJAX (indeed it's GWT). I have also implemented AJAX crawling proposed by Google. However, after the implementation, I found that neither Yahoo , Bing, nor Baidu implemented that scheme! I'm wondering if there is a way to identify the web client is a search robot. If they are, they will be shown the HTML snapshot I created. It will be best if I can identify them in APACHE level, then I can just do a mod_rewrite. But it's still ok if I can do that in PHP or GWT.

    Read the article

  • How could I manage Google Adsense to approve my Web App? It keeps denying it

    - by Javierfdr
    Google adsense keeps denying my app from having ads, because of an "insufficient content" issue. I manage a Web Application that allows the users to set Youtube Videos as Alarm Clocks. It includes an in-site Youtube search to retrieve videos from user queries and lists the users alarms. The site has a good traffic (500 users per day), is currently promoted by Google in Google Chrome Webstore, and the ajax requests are crawlable, following Google's guidelines (https://developers.google.com/webmasters/ajax-crawling/). Although I understand there is not much content, beyond the user-generated, I really don't what else should I include in the site. Perhaps adding contact and about pages, and maybe another section would increase the navigation. Google argues I need a "fully launched and functioning site, allowing users to navigate throughout your site with a menu, sitemap, or appropiate links". They also ask for "full sentences or paragraphs" Isn't a Google Adsense solutions for Web Applications? Would all the web-apps have to include useless navigable subpages?

    Read the article

  • Why google isn't updating my site title in search results? [closed]

    - by SharkTheDark
    Possible Duplicate: Google doesn't seem to update the description or title of my homepage I had my domain for few days before I uploaded site to it, and it had one title, and then when I uploaded content it should get new title, but with my misunderstanding of WordPress it had blocked robots.txt and keyword with no-index and no-follow. But I removed that like 7 days ago, and I see in reports that Google bot is crawling over my site, but my site title isn't updating, it still has old domain title when site wasn't there... My robots.txt has now: User-agent: * Allow: / I have clear title tag on every page. How long does it take to update? Do I need to check something else?

    Read the article

  • Where would you start if you were trying to solve this PDF classification problem?

    - by burtonic
    We are crawling and downloading lots of companies' PDFs and trying to pick out the ones that are Annual Reports. Such reports can be downloaded from most companies' investor-relations pages. The PDFs are scanned and the database is populated with, among other things, the: Title Contents (full text) Page count Word count Orientation First line Using this data we are checking for the obvious phrases such as: Annual report Financial statement Quarterly report Interim report Then recording the frequency of these phrases and others. So far we have around 350,000 PDFs to scan and a training set of 4,000 documents that have been manually classified as either a report or not. We are experimenting with a number of different approaches including Bayesian classifiers and weighting the different factors available. We are building the classifier in Ruby. My question is: if you were thinking about this problem, where would you start?

    Read the article

  • Getting a lot of '/_' errors from webmaster tools

    - by Vermino
    I'm using a WordPress site and I thought I got all the kinks out of it. For some reason Webmaster Tools is crawling my website and showing a lot of 404 errors which are from /_ like additional pages that I've never created. I just can't figure out what is creating these for Google crawlers and then displaying a 404. My robots.txt is here. My sitemap (created by the Yoast plugin) is here. I have Yoast and Jetpack plugins installed. What could be causing these links to appear

    Read the article

  • Another website is mirroring and ranks above my site in search results

    - by Marlboro Goodluck
    There is a site of ill-repute known as thedirty which has completely mirrored my site and now has links appearing on Google at the #1 spot using my content. I checked my log files and noticed that this site has been crawling mine for sometime, and also has 10,000 links from their site to mine. I have blocked user access which is referred from this site and reported them as web spam to Google already. I also disavowed the domain. How are they getting top links in Google (even overtaking mine) for such nefarious tactics? What are the steps to completely eliminating an issue such as this?

    Read the article

  • How to identify a PDF classification problem?

    - by burtonic
    We are crawling and downloading lots of companies' PDFs and trying to pick out the ones that are Annual Reports. Such reports can be downloaded from most companies' investor-relations pages. The PDFs are scanned and the database is populated with, among other things, the: Title Contents (full text) Page count Word count Orientation First line Using this data we are checking for the obvious phrases such as: Annual report Financial statement Quarterly report Interim report Then recording the frequency of these phrases and others. So far we have around 350,000 PDFs to scan and a training set of 4,000 documents that have been manually classified as either a report or not. We are experimenting with a number of different approaches including Bayesian classifiers and weighting the different factors available. We are building the classifier in Ruby. My question is: if you were thinking about this problem, where would you start?

    Read the article

  • Duplicate page content and the Google index

    - by Kit Sunde
    I have a static pages with dynamically expanding content that google is indexing. I also have deep links into virtually duplicate pages which will pre-expand the relevant section of content into the relevant section. It seems like Google is ignoring all my specialized pages and not putting them in the index. Even after going through web-masters tools, crawling and submitting them to the index manually. I also use the google API for integrating search on the site, and the deep linked pages won't show up. Is there a good solution for this?

    Read the article

  • Another website is mirroring my site

    - by Marlboro Goodluck
    Question for you all. There is a site of ill repute known as thedirty which has completely mirrored my site and now has links appearing on Google at the #1 spot using my content. I checked my log file and noticed that this site has been crawling mine from sometime, and also has 10k links from their site to mine. I have blocked user access which is referred from this site and reported them as web spam to Google already. I also disavowed the domain. How are they getting top links in Google (even overtaking mine) for such nefarious tactics? What are the steps to completely eliminating an issue such as this?

    Read the article

  • Google indexing pages with #! although we don't have any

    - by Benjamin Gruenbaum
    Our company has developed a Single Page Application using AngularJS and its routing. Google indexed our site decently with JavaScript but it did not index some pages very well so we have developed an HTML only version. We have followed the Ajax Crawling Specification posted here and have a <meta name='fragment' content='!'> tag and canonical urls. We expect http://www.example.com/foo/bar to be fetched from http://www.example.com/?_escaped_fragment_=/foo/bar. However, we have found out that when we rolled the AJAX specification we now have all pages indexed twice, once with the JavaScript version as http://www.example.com/foo/bar and once with the new version as http://www.example.com/#!/foo/bar. This is harmful to us since it's duplicate content and also mis-representing out site. I have tried looking for similar questions here and in the Google product forum but could not come up with anything.

    Read the article

  • Xpath Injection detection Tool

    - by preeti
    Hi, I am working on xpath Injection attack, so looking forward to build a tool to detect xpath Injection Tool in a website.Is web crawling and scanning be used for this? What can be the Logic to detect it? Are there any open source tools to detect it, so that i can develop it in Java by looking at logic used in that code. Thank You.

    Read the article

  • WebCrawling Dynamic Links

    - by Jojo
    Hi Everyone, Anybody has any idea on crawling websites that have dynamic pages/queries? I mean if I click a certain link, it has different values every I try to reload it in a web browser. Now my webcrawler could not download the contents of these pages. Please advise.

    Read the article

  • How to write a crawler?

    - by Jason
    Hi All, I have had thoughts of trying to write a simple crawler that might crawl and produce a list of its findings for our NPO's websites and content. Does anybody have any thoughts on how to do this? Where do you point the crawler to get started? How does it send back its findings and still keep crawling? How does it know what it finds, etc,etc. Thanks! -Jason

    Read the article

  • How to retrieve Directories size including all sub-directories?

    - by vikingosegundo
    I have stored images from the net like this Documents/imagecache/domain1/original/path/inURI/foo.png Documents/imagecache/domain2/original/path/inURI/bar.png Documents/imagecache/... Documents/imagecache/... Now I'd like to check the size of imagecache including all it sub-directories. Is there a convenient way of doing it — preferable without crawling through all the data manually?

    Read the article

  • Google search box

    - by user343282
    I am working on a google box, something like this, http://mytwentyfive.com/blog/wp-content/uploads/byme/Google%20Search%20Appliances.jpg I am pointing the crawler to a folder where there are html files. before the crawler was crawling the files and indexing them but right now it finds the pattern or the folder but not following any html files within the folder. I have tried everything I could and know but, can't think of anything else. Can someone help? thanks

    Read the article

  • Investment advice data dump analysis

    - by portoalet
    For my year-end pet project, I'd like to analyze investment advices and their correlation to the stock market performance. The problem is, where do I get the dump of investment advice data (free) ? something like stackoverflow.com data dump will be nice. Or maybe it's easier to do distributed crawling and crawl the public finance webpages for investment advices? Investment advice is buy/sell advice for stocks/forex, issued by institution/investment advisor.

    Read the article

  • Retrieivng coordinates in this page

    - by hao
    Hey guys, Im trying to do some data mining and analyze data based on locations. For this site, http://www.dianping.com/shop/1898365 I am trying to figure out whats the latitude and longitude by crawling. But I cant seem to figure out where this information is stored. Can someone give me some pointers

    Read the article

  • Sharepoint Search crawl not working

    - by Satish
    Search Crawling is error out on my MOSS 2007 installation. I get the following error for all the web apps I have following error in Crawl logs. http://mysites.devserver URL could not be resolved. The host may be unavailable, or the proxy settings are not configured correctly on the index server. The Application Event log also has the following corresponding error The start address http://mysites.devserver cannot be crawled. Context: Application 'SSPMain', Catalog 'Portal_Content' Details: The URL of the item could not be resolved. The repository might be unavailable, or the crawler proxy settings are not configured. To configure the crawler proxy settings, use the Proxy and Timeout page in search administration. (0x80041221) I'm using Windows 2008 server. I tried accessing the site using the above mentioned url and its available. I did the registry setting for loop back issue found here http://support.microsoft.com/kb/896861 still not luck. Any Ideas?

    Read the article

< Previous Page | 2 3 4 5 6 7 8 9 10  | Next Page >