Search Results

Search found 446 results on 18 pages for 'crawl'.

Page 5/18 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

Reset crawled content for one content source in MOSS

- by Andrew Strong

I'm using MOSS 2007 and I want to wipe out the search index and do a full re-crawl of one of my content sources. I see the link to 'Reset all crawled content' but I don't want to have to re-crawl the other content sources. Is there any way to do this?

Read the article
search engine crawling frequency

- by Aditya Pratap Singh

I want to design a search engine for news websites ie. download various article pages from these websites, index the pages, and answer search queries on the index. I want a short pseudocode to find an appropriate crawling frequency -- i do not want to crawl too often because the website may not have changed, and do not want to crawl too infrequently because index would then be out of date. Assume that crawling code looks as follows while(1) { sleep(sleep_interval); // sleep for sleep_interval crawl(website); // crawls the entire website diff = diff(currently_crawled_website, previously_crawled_website); // returns a % value of difference between the latest and previous crawls of the website sleep_interval = infer_sleep_interval(diff, sleep_interval); } looking for a pseudocode for the infer_sleep_interval method: long sleep_interval infer_sleep_interval(int diff_percentage,long previous_sleep_interval) { ... ... ... } i want to design method which adaptively alters the sleeping interval based on the update frequency of the website.

Read the article
Pre-cache site as user visits

- by strager

I am making a static site which is 'forced' to be cached via Cache-control, etc. When a user visits my site, I want the browser to crawl my site, caching pages, so when the user navigates to a page, the load is almost instant. (I do not need a recursive crawl, as that will probably happen as the user navigates between pages. I just need to crawl the links on the current page, and of course not re-caching a page which has already been cached.) (Also, I am not changing pages using Ajax-like techniques. These are essentially normal flat HTML files with normal links.) How can I do this pre-caching using Javascript? (I am using jQuery.)

Read the article
More than 100,000 articles !

- by developerit

In one month, we already got more than 100,000, and we continue to crawl! We plan on hitting 250,000 total articles next month. Due to the large amount of data we are gathering, we are planning on updating our SQL stored procedure to improve performance. We may be migrating to SQL Server 2008 Entreprise, as we are currently running on SQL Server 2005 Express Edition… We are at 400 Mb of data, getting more and more close to the 2 Gb limit. Stay tune for more info and browse daily fresh articles about web development.

Read the article
What is adding frog characters to my URLs?

- by Jacob Hume

While browsing the "Crawl Errors" section of Google Webmaster Tools, I discovered a set of very strange 500 errors in reference to my site: I was able to track down what these characters are, and apparently they are the first two characters in the Unicode Private Use Area. My font just happened to map them to a frog wearing a tiny crown, and a symbol that resembles the numeral 7. These symbols only appear on the addresses of non-HTML files; office documents, PDFs, etc. - but they do not just appear in the file name. Where are these symbols coming from, and is there any way I can get rid of them so Google can properly crawl my site? Some background information: Using Web Server running WS2K3 with IIS6 and PHP 5.3.8 Site encoding is UTF-8 These symbols don't appear on the page, or in the source

Read the article
Doubt regarding search engine/plugin(One present on the website itself)

- by Ravi Gupta

I am new to web development and trying to study various types of websites as case study. Right now my focus is on how search engines works for an eCommerce website. I know basic functioning for a search engine, i.e. crawl web pages, index them and the display the results using those indexes. But I got little confuse in case of an eCommerce website. Don't you think that it would be better if a search engine instead of crawling the web pages containing products, it should directly crawl the database and index the products stored in the database? And when a user search for any product, it will simply give us the rows of the table which matches the user query? If this is not the case, can someone please explain how the usual method works on eCommerce website?

Read the article
how to get new site indexed by alexa

- by JohnMerlino

When I search my site on alexa, it says "Alexa Traffic Rank: No Data". So I google the issue. I come across this page: http://www.loudable.com/my-website-data-not-showing-in-alexa-get-your-website-crawled-by-alexasolution.html It says to get site indexed, click Crawl my site on the webmasters page. However there is no longer a link that says "Crawl my site". So as of now, does anyone know how to get site indexed by alexa so that my traffic rank will display on alexa index?

Read the article
SharePoint Search Problem: The start address sps3://server cannot be crawled.

- by Clara Oscura

With this post, I'm going to start a series on problems I have encountered with SharePoint search. Error: The start address sps3://luapp105 cannot be crawled. Context: Application 'Search_Service_Application', Catalog 'Portal_Content' Details: Access is denied. Verify that either the Default Content Access Account has access to this repository, or add a crawl rule to crawl this repository. If the repository being crawled is a SharePoint repository, verify that the account you are using has "Full Read" permissions on the SharePoint Web Application being crawled. (0x80041205) (Event ID: 14, Task Category: Gatherer) Solution: give appropriate permissions to User Profile Synchronisation Service http://social.technet.microsoft.com/Forums/en-US/sharepoint2010setup/thread/64cdf879-f01e-4595-bc52-15975fefd18d http://www.dotnetmafia.com/blogs/dotnettipoftheday/archive/2010/03/29/how-to-set-up-people-search-in-sharepoint-2010.aspx

Read the article
how to avoid getting negative points from google adsense

- by Napster

I have news based website, in which primary contents include news,image albums and videos. out of these i have copy rights for images and videos are just youtube embedded videos. Coming to news my site is kind of a mashup. It gathers data from various sites and presents them in more user friendly way for quick digestion and access. My problem here is that since the news part of the site can be found from other sites, my site could suffer in search rankings. Is there any solution to this. One thing I thought of is to put disallow on all the news ariticles pages, so google does not crawl them. Will this be helpful to me. When applying to google adsense does google crawl these pages (disallow) also.

Read the article
301 redirect from a country specific domain

- by Raj

I originally started using a .do domain extension for my site, but later realized that this country specific domain would prevent us from appearing in search results for places outside of the Dominican Republic. We started using a .co domain extension and redirected all requests to the new domain using an HTTP 301. The "Crawl Stats" in Google Webmaster Tools shows me that the .co domain is being crawled, but the "Index Status" shows the number of pages indexed at 0. The "Crawl Stats" for the .do domain says that it's being crawled and the "Index Status" shows a number greater than 0. I also set a "Change Of Address" in Google Webmaster Tools to have the .do domain point to the new .co domain. We're still not appearing in search results at all even for very specific strings where I would expect to find us. Am I doing something wrong?

Read the article
How to hide any text from sighted user and search engine but not from screen reader?

- by metal-gear-solid

What is the best tested way to hide any text from sighted user but not from popular screen readers? and without affecting SEO. for example if i adding any hidden text only for screen-reader users then that text should not be crawl by search engine when search engine will crawl that page.

Read the article
How to create managed properties at site collection level in SharePoint2013

- by ybbest

In SharePoint2013, you can create managed properties at site collection. Today, I’d like to show you how to do so through PowerShell. 1. Define your managed properties and crawled properties and managed property Type in an external csv file. PowerShell script will read this file and create the managed and the mapping. 2. As you can see I also defined variant Type, this is because you need the variant type to create the crawled property. In order to have the crawled properties, you need to do a full crawl and also make sure you have data populated for your custom column. However, if you do not want to a full crawl to create those crawled properties, you can create them yourself by using the PowerShell; however you need to make sure the crawled properties you created have the same name if created by a full crawl. Managed properties type: Text = 1 Integer = 2 Decimal = 3 DateTime = 4 YesNo = 5 Binary = 6 Variant Type: Text = 31 Integer = 20 Decimal = 5 DateTime = 64 YesNo = 11 3. You can use the following script to create your managed properties at site collection level, the differences for creating managed property at site collection level is to pass in the site collection id. param( [string] $siteUrl="http://SP2013/", [string] $searchAppName = "Search Service Application", $ManagedPropertiesList=(IMPORT-CSV ".\ManagedProperties.csv") ) Add-PSSnapin Microsoft.SharePoint.PowerShell -ErrorAction SilentlyContinue $searchapp = $null function AppendLog { param ([string] $msg, [string] $msgColor) $currentDateTime = Get-Date $msg = $msg + " --- " + $currentDateTime if (!($logOnly -eq $True)) { # write to console Write-Host -f $msgColor $msg } # write to log file Add-Content $logFilePath $msg } $scriptPath = Split-Path $myInvocation.MyCommand.Path $logFilePath = $scriptPath + "\CreateManagedProperties_Log.txt" function CreateRefiner {param ([string] $crawledName, [string] $managedPropertyName, [Int32] $variantType, [Int32] $managedPropertyType,[System.GUID] $siteID) $cat = Get-SPEnterpriseSearchMetadataCategory –Identity SharePoint -SearchApplication $searchapp $crawledproperty = Get-SPEnterpriseSearchMetadataCrawledProperty -Name $crawledName -SearchApplication $searchapp -SiteCollection $siteID if($crawledproperty -eq $null) { Write-Host AppendLog "Creating Crawled Property for $managedPropertyName" Yellow $crawledproperty = New-SPEnterpriseSearchMetadataCrawledProperty -SearchApplication $searchapp -VariantType $variantType -SiteCollection $siteID -Category $cat -PropSet "00130329-0000-0130-c000-000000131346" -Name $crawledName -IsNameEnum $false } $managedproperty = Get-SPEnterpriseSearchMetadataManagedProperty -Identity $managedPropertyName -SearchApplication $searchapp -SiteCollection $siteID -ErrorAction SilentlyContinue if($managedproperty -eq $null) { Write-Host AppendLog "Creating Managed Property for $managedPropertyName" Yellow $managedproperty = New-SPEnterpriseSearchMetadataManagedProperty -Name $managedPropertyName -Type $managedPropertyType -SiteCollection $siteID -SearchApplication $searchapp -Queryable:$true -Retrievable:$true -FullTextQueriable:$true -RemoveDuplicates:$false -RespectPriority:$true -IncludeInMd5:$true } $mappedProperty = $crawledproperty.GetMappedManagedProperties() | ?{$_.Name -eq $managedProperty.Name } if($mappedProperty -eq $null) { Write-Host AppendLog "Creating Crawled -> Managed Property mapping for $managedPropertyName" Yellow New-SPEnterpriseSearchMetadataMapping -CrawledProperty $crawledproperty -ManagedProperty $managedproperty -SearchApplication $searchapp -SiteCollection $siteID } $mappedProperty = $crawledproperty.GetMappedManagedProperties() | ?{$_.Name -eq $managedProperty.Name } #Get-FASTSearchMetadataCrawledPropertyMapping -ManagedProperty $managedproperty } $searchapp = Get-SPEnterpriseSearchServiceApplication $searchAppName $site= Get-SPSite $siteUrl $siteId=$site.id Write-Host "Start creating Managed properties" $i = 1 FOREACH ($property in $ManagedPropertiesList) { $propertyName=$property.managedPropertyName $crawledName=$property.crawledName $managedPropertyType=$property.managedPropertyType $variantType=$property.variantType Write-Host $managedPropertyType Write-Host "Processing managed property $propertyName $($i)..." $i++ CreateRefiner $crawledName $propertyName $variantType $managedPropertyType $siteId Write-Host "Managed property created " $propertyName } Key Concepts Crawled Properties: Crawled properties are discovered by the search index service component when crawling content. Managed Properties: Properties that are part of the Search user experience, which means they are available for search results, advanced search, and so on, are managed properties. Mapping Crawled Properties to Managed Properties: To make a crawled property available for the Search experience—to make it available for Search queries and display it in Advanced Search and search results—you must map it to a managed property. References Administer search in SharePoint 2013 Preview Managing Metadata New-SPEnterpriseSearchMetadataCrawledProperty New-SPEnterpriseSearchMetadataManagedProperty Remove-SPEnterpriseSearchMetadataManagedProperty Overview of crawled and managed properties in SharePoint 2013 Preview Remove-SPEnterpriseSearchMetadataManagedProperty SharePoint 2013 – Search Service Application

Read the article
How to create managed properties at site collection level in SharePoint2013

- by ybbest

In SharePoint2013, you can create managed properties at site collection. Today, I’d like to show you how to do so through PowerShell. 1. Define your managed properties and crawled properties and managed property Type in an external csv file. PowerShell script will read this file and create the managed and the mapping. 2. As you can see I also defined variant Type, this is because you need the variant type to create the crawled property. In order to have the crawled properties, you need to do a full crawl and also make sure you have data populated for your custom column. However, if you do not want to a full crawl to create those crawled properties, you can create them yourself by using the PowerShell; however you need to make sure the crawled properties you created have the same name if created by a full crawl. Managed properties type: Text = 1 Integer = 2 Decimal = 3 DateTime = 4 YesNo = 5 Binary = 6 Variant Type: Text = 31 Integer = 20 Decimal = 5 DateTime = 64 YesNo = 11 3. You can use the following script to create your managed properties at site collection level, the differences for creating managed property at site collection level is to pass in the site collection id. param( [string] $siteUrl="http://SP2013/", [string] $searchAppName = "Search Service Application", $ManagedPropertiesList=(IMPORT-CSV ".\ManagedProperties.csv") ) Add-PSSnapin Microsoft.SharePoint.PowerShell -ErrorAction SilentlyContinue $searchapp = $null function AppendLog { param ([string] $msg, [string] $msgColor) $currentDateTime = Get-Date $msg = $msg + " --- " + $currentDateTime if (!($logOnly -eq $True)) { # write to console Write-Host -f $msgColor $msg } # write to log file Add-Content $logFilePath $msg } $scriptPath = Split-Path $myInvocation.MyCommand.Path $logFilePath = $scriptPath + "\CreateManagedProperties_Log.txt" function CreateRefiner {param ([string] $crawledName, [string] $managedPropertyName, [Int32] $variantType, [Int32] $managedPropertyType,[System.GUID] $siteID) $cat = Get-SPEnterpriseSearchMetadataCategory –Identity SharePoint -SearchApplication $searchapp $crawledproperty = Get-SPEnterpriseSearchMetadataCrawledProperty -Name $crawledName -SearchApplication $searchapp -SiteCollection $siteID if($crawledproperty -eq $null) { Write-Host AppendLog "Creating Crawled Property for $managedPropertyName" Yellow $crawledproperty = New-SPEnterpriseSearchMetadataCrawledProperty -SearchApplication $searchapp -VariantType $variantType -SiteCollection $siteID -Category $cat -PropSet "00130329-0000-0130-c000-000000131346" -Name $crawledName -IsNameEnum $false } $managedproperty = Get-SPEnterpriseSearchMetadataManagedProperty -Identity $managedPropertyName -SearchApplication $searchapp -SiteCollection $siteID -ErrorAction SilentlyContinue if($managedproperty -eq $null) { Write-Host AppendLog "Creating Managed Property for $managedPropertyName" Yellow $managedproperty = New-SPEnterpriseSearchMetadataManagedProperty -Name $managedPropertyName -Type $managedPropertyType -SiteCollection $siteID -SearchApplication $searchapp -Queryable:$true -Retrievable:$true -FullTextQueriable:$true -RemoveDuplicates:$false -RespectPriority:$true -IncludeInMd5:$true } $mappedProperty = $crawledproperty.GetMappedManagedProperties() | ?{$_.Name -eq $managedProperty.Name } if($mappedProperty -eq $null) { Write-Host AppendLog "Creating Crawled -> Managed Property mapping for $managedPropertyName" Yellow New-SPEnterpriseSearchMetadataMapping -CrawledProperty $crawledproperty -ManagedProperty $managedproperty -SearchApplication $searchapp -SiteCollection $siteID } $mappedProperty = $crawledproperty.GetMappedManagedProperties() | ?{$_.Name -eq $managedProperty.Name } #Get-FASTSearchMetadataCrawledPropertyMapping -ManagedProperty $managedproperty } $searchapp = Get-SPEnterpriseSearchServiceApplication $searchAppName $site= Get-SPSite $siteUrl $siteId=$site.id Write-Host "Start creating Managed properties" $i = 1 FOREACH ($property in $ManagedPropertiesList) { $propertyName=$property.managedPropertyName $crawledName=$property.crawledName $managedPropertyType=$property.managedPropertyType $variantType=$property.variantType Write-Host $managedPropertyType Write-Host "Processing managed property $propertyName $($i)..." $i++ CreateRefiner $crawledName $propertyName $variantType $managedPropertyType $siteId Write-Host "Managed property created " $propertyName } Key Concepts Crawled Properties: Crawled properties are discovered by the search index service component when crawling content. Managed Properties: Properties that are part of the Search user experience, which means they are available for search results, advanced search, and so on, are managed properties. Mapping Crawled Properties to Managed Properties: To make a crawled property available for the Search experience—to make it available for Search queries and display it in Advanced Search and search results—you must map it to a managed property. References Administer search in SharePoint 2013 Preview Managing Metadata

Read the article
Problem parsing an atom feed using simplexml_load_file(), can't get an attribute.

- by Craig Ward

Hi, I am trying to create a social timeline. I pull in feeds form certain places so I have a timeline of thing I have done. The problem I am having is with Google reader Shared Items. I want to get the time at which I shared the item which is contained in <entry gr:crawl-timestamp-msec="1269088723811"> Trying to get the element using $date = $xml->entry[$i]->link->attributes()->gr:crawl-timestamp-msec; fails because of the : after gr which causes a PHP error. I could figure out how to get the element, so thought I would change the name using the code below but it throws the following error Warning: simplexml_load_file() [function.simplexml-load-file]: I/O warning : failed to load external entity "<?xml version="1.0"?><feed xmlns:idx="urn:atom-extension:indexing" xmlns:media="http://search.yahoo.com/mrss/" xmlns <?php $get_feed = file_get_contents('http://www.google.com/reader/public/atom/user/03120403612393553979/state/com.google/broadcast'); $old = "gr:crawl-timestamp-msec"; $new = "timestamp"; $xml_file = str_replace($old, $new, $get_feed); $xml = simplexml_load_file($xml_file); $i = 0; foreach ($xml->entry as $value) { $id = $xml->entry[$i]->id; $date = date('Y-m-d H:i:s', strtotime($xml->entry[$i]->attributes()->timestamp )); $text = $xml->entry[$i]->title; $link = $xml->entry[$i]->link->attributes()->href; $source = "googleshared"; echo "date = $date<br />"; $sql="INSERT IGNORE INTO timeline (id,date,text,link, source) VALUES ('$id', '$date', '$text', '$link', '$source')"; mysql_query($sql); $i++; }` Could someone point me in the right direction please. Cheers Craig

Read the article
Do or can robots cause considerable performance issues?

- by Anicho

So the question in the title is exactly what I am trying to find out. My case is: At work we are in a discussion with team members who seem to think bots will cause us problems relating to performance when running on our services website. Out setup: Lets say I have site www.mysite.co.uk this is a shop window to our online services which sit on www.mysiteonline.co.uk. When people search in google for mysite they see mysiteonline.co.uk as well as mysite.co.uk. Cases against stopping bots crawling: We don't store gb's of data publicly available on the web Most friendly bots, if they were to cause issues would have done so already In our instance the bots can't crawl the site because it requires username & password Stopping bots with robot .txt causes an issue with seo (ref.1) If it was a malicious bot, it would ignore robot.txt or meta tags anyway Ref 1. If we were to block mysiteonline.co.uk from having robots crawl this will affect seo rankings and make it inconvenient for users who actively search for mysite to find mysiteonline. Which we can prove is the case for a good portion of our users.

Read the article
How to get search engines to properly index an ajax driven search page

- by Redtopia

I have an ajax-driven search page that will allow users to search through a large collection of records. Each search result points to index.php?id=xyz (where xyz is the id of the record). The initial view does not have any records listed, and there is no interface that allows you to browse through all records. You can only conduct a search. How do I build the page so that spiders can crawl each record? Or is there another way (outside of this specific search page) that will allow me to point spiders to a list of all records. FYI, the collection is rather large, so dumping links to every record in a single request is not a workable solution. Outputting the records must be done in multiple requests. Each record can be viewed via a single page (eg "record.php?id=xyz"). I would like all the records indexed without anything indexed from the sitemap that shows where the records exist, for example: <a href="/result.php?id=record1">Record 1</a> <a href="/result.php?id=record2">Record 2</a> <a href="/result.php?id=record3">Record 3</a> <a href="/seo.php?page=2">next</a> Assuming this is the correct approach, I have these questions: How would the search engines find the crawl page? Is it possible to prevent the search engines from indexing the words "Record 1", etc. and "next"? Can I output only the links? Or maybe something like:

Read the article
Need to sanity-check my .htaccess, especially Limit GET POST line for Google repellent

- by jose

I need a sanity check on this .htaccess (from a WordPress site) I inherited from a 5 month+ old site. What's the symptom? Google + Bing crawl, but don't index any of the pages. Let me be clear: I'm not mad about "not ranking high." I think something is (accidentally) rejecting search engine indexing. I am not an expert on .htaccess, but one part especially looked funny, the Limit GET POST line. Is it not weird to have both Allow and Deny all, with no parameters? Also, I've ruled out robots.txt, but if I were you I'd want to see it, so here it is: User-agent: * Crawl-delay: 30 And here's the more suspect .htaccess: # temp redirect wordpress content feeds to feedburner <IfModule mod_rewrite.c> RewriteEngine on RewriteCond %{HTTP_USER_AGENT} !FeedBurner [NC] RewriteCond %{HTTP_USER_AGENT} !FeedValidator [NC] RewriteRule ^feed/?([_0-9a-z-]+)?/?$ http://feeds.feedburner.com/anonymousblog [R=302,NC,L] </IfModule> # temp redirect wordpress comment feeds to feedburner <IfModule mod_rewrite.c> RewriteEngine on RewriteCond %{HTTP_USER_AGENT} !FeedBurner [NC] RewriteCond %{HTTP_USER_AGENT} !FeedValidator [NC] RewriteRule ^comments/feed/?([_0-9a-z-]+)?/?$ http://feeds.feedburner.com/anonymous_comments [R=302,NC,L] </IfModule> <IfModule mod_rewrite.c> RewriteEngine On RewriteBase / RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule . /index.php [L] </IfModule> IndexIgnore .htaccess */.??* *~ *# */HEADER* */README* */_vti* <Limit GET POST> order deny,allow deny from all allow from all </Limit> <Limit PUT DELETE> order deny,allow deny from all </Limit> php_value memory_limit 32M Adding header by request: <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <meta name="robots" content="noindex,nofollow" /> <meta name="description" content="buncha junk i've deleted." /> <meta name="keywords" content="keywords i've deleted" /> <meta name="viewport" content="width=device-width" />

Read the article
SharePoint 2007 / 2010 Content Indexing “The file reached the maximum download limit. Check that the full text of the document can be meaningfully crawled.”

- by Stacy Vicknair

If you have large files in a content source that is being indexed by Sharepoint you might run into the following error message: “The file reached the maximum download limit. Check that the full text of the document can be meaningfully crawled.” This is usually caused because SharePoint’s MaxDownloadSize setting is set lower than the size of the file you are attempting to index. You can increase this value, restart the service then kick off a full crawl in order to fix this issue, but SharePoint 2007 and 2010 have different methods for accomplishing this task. Sharepoint 2007 Open up the Registry editor and increase the MaxDownloadSize value to a number (in MB) higher than the largest file being indexed. You can find this at: HKEY_LOCAL_MACHINE\Software\Microsoft\Search\1.0\Gathering Manager After you increase the size, cycle the search service and kick off a full crawl of the content source in question. Sharepoint 2010 With SharePoint 2010 you can use PowerShell via the Sharepoint 2010 Console in order to change the MaxDownloadSize. Execute the following commands to update the value: 1: $ssa = Get-SPEnterpriseSearchServiceApplication 2: $ssa.SetProperty(“MaxDownloadSize”, <new size in MB>) 3: $ssa.Update() References: http://support.microsoft.com/kb/287231 http://blogs.technet.com/b/brent/archive/2010/07/19/sharepoint-server-2010-maxdownloadsize-and-maxgrowfactor.aspx Technorati Tags: SharePoint,WSS,MaxDownloadSize,Search

Read the article
.Net search engine architecture and technology choice

- by shrivb

I am in the process of designing a search engine for an asp.net site. The site currently uses Microsoft Indexing Server to index and search content which range from simple text files to MS documents to PDFs. MIS is also used to crawl File servers. MIS in tandem with Index Server Companion crawls for content from external sites. I intend to replace MIS with the indexer/crawler I am trying to build. Since my platform is completely on the Microsoft stack, I cant afford to have a Java application server. Thus, Solr, and effectively, SolrNet is ruled out. With this being the context, I have couple of questions. 1.Technology choice I had done my initial investigation and looked at Lucene.Net. There seemed to be 2 issues in using Lucene.Net. First being, it cant crawl external content. There doesn't seem to be a direct port of Nutch in .Net. Second, since it is just an indexer, it cant parse various document types. The parsing is left to the developer. So, what would be best technology choice on the .Net platform to achieve indexing & crawling? Are there any .Net open source libraries available for document parsing? 2.Architectural pattern Is there any general architectural pattern or best practice that needs to be followed in designing such a search engine? Thanks in advance.

Read the article
Creating sitemap for Googlebot - how to mark dynamic content / dynamic subpages?

- by ojek

I have a website that is an Internet forum. This forum has many categories, and a single category page that contains a lot of subpages with listed threads. This Internet forum is brand new, and about a week ago I filled it with a few hundred thousand threads. I then looked at my Google Webmasters Tools page to see any changes in indexing, but the index went up from 300 to about 1200, so that means it did not index my added threads (although it added something). The following is what my sitemap.xml contains, which I uploaded to their website. Of course there is a lot more code, this is just a snippet for a single category. In my real sitemap file I have all the categories listed as below: <url> <loc>http://mysite.com/Forums/Physics</loc> <changefreq>hourly</changefreq> </url> Now, I would expect Googlebot to go into mysite.com/Forums/Physics, and crawl through all the subpages with thread links, and then crawl inside of each thread and index its content. How can I achieve this? Also if this is unclear, I will add a real link to my website.

Read the article
Working with Google Webmaster Tools

- by com

My first question is about Crawl errors in Google Webmaster Tools. Crawl errors is devided into few sections. One of them is HTTP. I assume that all broken links in HTTP was somehow found by crawler, this is not the links from sitemap. If this was found by scanning all sitemap pages for links, why it doesn't mention what was the source page, like in sitemap section with column Linked From. And what the meaning of Linked From, I thought if the name of section is sitemap, therefore all URLs should be taken from sitemap, so why there is Linked From? The second question, what is the best way to trreat searching on the site. How come the searching result page are getting indexed? Because of the fact that all searching result page are getting indexed, I have to many page in Linked From. What's the right practice? Question three: In order to improve response time in WMT, can I redirect all crawler's requests to designated free web server? Is this good practice? Question four: How should I treat Google Analytics Code (with parameters PageView, PageLoadTime), in the case user request non existing page, should I render Google code or not? Right now I use Google Analytics Code on the common template page, such that every page, also non existing page with error message contains Google Analytics Code, it seems like it has influence on WMT.

Read the article
Sharepoint managed Properties

- by paulie

Originally posted on StackOverflow, and edited for clarity I have a custom Content Type inside a list that has over 30 items (Which were uploaded via DockIt), and I have added several "managed properties" to the "crawled properties", in the SSP. All of them work except 1. The column "Synopsis" is a multiline field with no limit on it's length. It appears as a crawled property "Synopsis", and is mapped to a managed property 'asynop'. On the 'Advanced Search Page', it is added as a property and searchable, however it only returns a some matching records (if any). I manually created an entry, ran the crawl and was able to search for it. I edited an existing entry, ran the crawl (full and incremental), and it still only returned the manually entered entry. If I entered the search term in the Search box directly "asynop:fatigue", then all the correct results appear. Why is this happening? And could it please stop?

Read the article
Configuring Expert Search in Communicator 14 and SharePoint 2010

Communicator 14 provides functionality to be able to search for contacts not just by name, but by skill. For example a customer service agent at an airline can search for other agents with Travel Advisory experience by typing the search criteria into the Communicator search box and performing a search by keyword. The search results will return users who have specified that skill in their profile on their SharePoint My Site. This is actually pretty easy to configure, Ill show you how. Create Search and People Search Results Pages in SharePoint Communicator 14 Expert Search works by using the SharePoint 2010 Search Service to search SharePoint for user profiles with matching keywords. This requires that you have an Enterprise Search site in your site collection which includes the search service and also the People Results pages. The easiest way to do this is to create a Search Center site in your site collection. Note: I get an error when trying to create an Enterprise Search site in a Team Site in the SharePoint 2010 RTM bits, so I created it as a site collection that is evident in the URLs you see below. In the screenshots below, you can see that the URL of the SharePoint search service in the Search site collection is http://sps2010/sites/search/_vti_bin/search.asmx, and the URL of the People Search Results page is http://sps2010/sites/Search/Pages/peopleresults.aspx. Point Communications Server 14 to Search and People Search Results Pages For Communicator 14 to be able to perform an Expert Search, you need to configure Communications Server 14 to point to the Search Service and People Search Results page URLs. From a server with the OCS Core bits installed, fire up the Communications Server Management Shell and type Get-CsClientPolicy. Scroll down to the bottom of the output, were interested in setting the values of: SPSearchInternalURL SPSearchExternalURL SPSearchCenterInternalURL SPSearchCenterExternalURL SPSearchInternalURL and SPSearchExternalURL correspond to the internal and external URLs of the SharePoint search service in the Search site collection, while SPSearchCenterInternalURL and SPSearchCenterExternalURL correspond to the internal and external URLs of the people search results pages. Well use the Communications Server Management Shell to set the values of these CS policy properties. For simplicity, Im only going to set the internal URLs here. Set-CsClientPolicy SPSearchInternalURL http://sps2010/sites/search/_vti_bin/search.asmx -SPSearchCenterInternalURL http://sps2010/sites/Search/Pages/peopleresults.aspx Log out and back into Communicator. You can verify that these settings were applied by running the Get-CsClientPolicy cmdlet again from the Communications Server Management Shell. However, theres another super-secret ninja trick to verify that the settings were applied: Find the Communicator icon in the Windows System Tray Hold down the Ctrl button Click (left) the Communicator icon in the Windows System Tray do not depress the Ctrl button You should now see an extra menu item called Configuration Information, click it. Scroll down and locate the Expert Search URL and SharePoint Search Center URL keys and verify that their values correspond to those you set using the Set-CsClientPolicy PowerShell cmdlet. Configure a Sharepoint User Profile Import Im not going to provide detailed steps here except to say that you need to configure the SharePoint 2010 User Profile Service Application to import user account details from Active Directory on a scheduled basis. This is a critical step because there are several user profile properties e.g. SipAddress that are only populated by a user profile import. When performing an Expert Search, Communicator can only render results for users who have a SipAddress specified. Add Skills to User Profiles Navigate to your My Site and click on My Profile. This page allows you to set many contact details that are searchable in SharePoint. Were particularly interested in the Ask Me About property of a users profile. Expert Search searches against this property to find users with matching skills. Configure a SharePoint Search Crawl Ensure that you have a scheduled job to crawl your Local SharePoint Sites content source. Depending on how you have this configured, it will also crawl the My Site site collection and add user properties such as Ask Me About to the search index. Thats It! SharePoint 2010 provides new social and collaboration features to help users find other users with similar skills or interests. Expert Search extends this functionality directly into Microsoft Communicator 14, allowing you to interact with the users directly from the search results. Did you know that DotNetSlackers also publishes .net articles written by top known .net Authors? We already have over 80 articles in several categories including Silverlight. Take a look: here.

Read the article
How big can my SharePoint 2010 installation be?

- by Sahil Malik

Ad:: SharePoint 2007 Training in .NET 3.5 technologies (more information). 3 years ago, I had published “How big can my SharePoint 2007 installation be?” Well, SharePoint 2010 has significant under the covers improvements. So, how big can your SharePoint 2010 installation be? There are three kinds of limits you should know about Hard limits that cannot be exceeded by design. Configurable that are, well configurable – but the default values are set for a pretty good reason, so if you need to tweak, plan and understand before you tweak. Soft limits, you can exceed them, but it is not recommended that you do. Before you read any of the limits, read these two important disclaimers - 1. The limit depends on what you’re doing. So, don’t take the below as gospel, the reality depends on your situation. 2. There are many additional considerations in planning your SharePoint solution scalability and performance, besides just the below. So with those in mind, here goes. Hard Limits - Zones per web app 5 RBS NAS performance Time to first byte of any response from NAS must be less than 20 milliseconds List row size 8000 bytes driven by how SP stores list items internally Max file size 2GB (default is 50MB, configurable). RBS does not increase this limit. Search metadata properties 10,000 per item crawled (pretty damn high, you’ll never need to worry about it). Max # of concurrent in-memory enterprise content types 5000 per web server, per tenant Max # of external system connections 500 per web server PerformancePoint services using Excel services as a datasource No single query can fetch more than 1 million excel cells Office Web Apps Renders One doc per second, per CPU core, per Application server, limited to a maximum of 8 cores. Configurable Limits - Row Size Limit 6, configurable via SPWebApplication.MaxListItemRowStorage property List view lookup 8 join operations per query Max number of list items that a single operation can process at one time in normal hours 5000 Configurable via SPWebApplication.MaxItemsPerThrottledOperation Also you get a warning at 3000, which is configurable via SPWebApplication.MaxItemsPerThrottledOperationWarningLevel In addition, throttle overrides can be requested, throttle overrides can be disabled, and time windows can be set when throttle is disabled. Max number of list items for administrators that a single operation can process at one time in normal hours 20000 Configurable via SPWebApplication.MaxItemsPerThrottledOperationOverride Enumerating subsites 2000 Word and Powerpoint co-authoring simultaneous editors 10 (Hard limit is 99). # of webparts on a page 25 Search Crawl DBs per search service app 10 Items per crawl db 25 million Search Keywords 200 per site collection. There is a max limit of 5000, which can then be modified by editing the web.config/client.config. Concurrent # of workflows on a content db 15. Workflows running in the timer service are not counted in this limit. Further workflows are queued. Can be configured via the Set-SPFarmConfig powershell commandlet. Number of events picked by the workflow timer job and delivered to workflows 100. You can increase this limit by running additional instances of the workflow timer service. Visio services file size 50MB Visio web drawing recalculation timeout 120 seconds Configurable via – Powershell commandlet Set-SPVisioPerformance Visio services minimum and maximum cache age for data connected diagrams 0 to 24 hours. Default is 60 minutes. Configurable via – Powershell commandlet Set-SPVisioPerformance Soft Limits - Content Databases 300 per web app Application Pools 10 per web server Managed Paths 20 per web app Content Database Size 200GB per Content DB Size of 1 site collection 100GB # of sites in a site collection 250,000 Documents in a library 30 Million, with nesting. Depends heavily on type and usage and size of documents. Items 30 million. Depends heavily on usage of items. SPGroups one SPUser can be in 5000 Users in a site collection 2 million, depends on UI, nesting, containers and underlying user store AD Principals in a SPGroup 5000 SPGroups in a site collection 10000 Search Service Instances 20 Indexed Items in Search 100 million Crawl Log entries 100 million Search Alerts 1 million per search application Search Crawled Properties 1/2 million URL removals in search 100 removals per operation User Profiles 2 million per service application Social Tags 500 million per social database Comment on the article ....

Read the article
Crawling not working windows2008

- by axtolf

Hi, We installed a new MOSS 2007 farm on windows 2008 SP2 enviroment. We used SQL2008 too. Configuration is 1 index, 1 FE and 1 server with 2008, all on ESX 4.0. All the Service that need it uses a dedicated user, so search has a dedicated user. Installation went well and we found no problem. We installed an SP1 MOSS from a ISO and after we upgraded WSS and MOSS to SP2. We installed the Italian language pack too and patched it to SP2. We created a new SSP. We created a web application and created a root website under it. The problem is that we can't male crawling work in any way. Seems that crawling is not able to reach the web application that we want to crawl. In event viewer of the index we have this error when we try to crawl it: The start address <h..p://name.domain.it:81 cannot be crawled. Context: Application 'SSP1', Catalog 'Portal_Content' Details: The object was not found. (0x80041201) The log of crawling from the search admin, only says: h..p://name.domani.it:81 The object was not found. (The item was deleted because it was either not found or the crawler was denied access to it.) The domain is fully accessible from everywhere using both farm admin user or the search user that we are using for service to run. Site is fully accessible from the index and seem not have problem. Inside the we application we created a root site collection with a couple of file. The log of the farm simply says.... nothing! When we ask to do a full crawl of the site, it runs for a second and after we have the errors that I wrote above. But the farm's log says nothing. Any suggestion or help is really appreciated since we are losing a lot of time on it and really we do not have any idea of what's wrong about this farm.

Read the article

Search Results

Search found 446 results on 18 pages for 'crawl'.

Page 5/18 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

- by Andrew Strong

- by Aditya Pratap Singh

- by strager

- by developerit

- by Jacob Hume

- by Ravi Gupta

- by JohnMerlino

- by Clara Oscura

- by Napster

- by Raj

- by metal-gear-solid

- by ybbest

- by ybbest

- by Craig Ward

- by Anicho

- by Redtopia

- by jose

- by Stacy Vicknair

- by shrivb

- by ojek

- by com

- by paulie

- by Sahil Malik

- by axtolf

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >