indexed - Page 11 - Developer IT

Can I use a Google Appliance/Mini to crawl and index sites I don't own?

- by SkippyFire

Maybe this is a stupid question, but... I am working with this company and they said they needed to get "permission" to crawl other people's sites. They have a Google Search Appliance And some Google Minis and want to point them at other sites to aggregate content. The end result will be something like a targeted search engine. (All the indexed sites relate to a specific topic) The only thing they will be doing is: Indexing Content from the other sites/domains Providing search functionality on their own site that searches the indexed content (like Google, displaying summaries and not the full content) The search results will provide links back to the original content Their intent is not malicious in nature, and is to provide a single site/resource for people to reference on their given topic. Is there anything illegal or fishy about this process?

Read the article

Annoying Search Behavior - Search Companion

- by DavidStein

I'm running Windows XP Professional, and ever since the last service pack I've had a searching problem. When I want to search a network drive, I get the following message: This folder is not indexed. To search this directory plase use Search Companion or add this directory to your index via options. Basically, I have two questions. Is there some way that I can use the indexed search where appropriate and then have it switch over to the Search Companion automatically? Second, how does a programmer look at this code and think this is a good idea? I realize that this question is rhetorical. However, I must enter my search string into one search, receive the error, and then click "Search Companion" to bring up the new search window. This window doesn't even take the defaults from the previous one so I have to specify the search string and drive again.

Read the article

Why are PNG-8 files mangled when opened in Photoshop?

- by Daniel Beardsley

Why are some 32 bit PNGs opened in Photoshop with Indexed Colors and no transparency? For instance, I grabbed a png icon file of the Stack Overflow logo at: http://blog.stackoverflow.com/wp-content/uploads/icon-so.png When opening it in Photoshop CS3, it apparently treats it as indexed color and gets rid of the alpha channel. The image on the right is a screen grab of the icon. Changing the Image mode in Photoshop to RGB doesn't change the image at all. I've tried this with a few other PNGs and it seems hit or miss. When viewed in other programs, it displays fine. left:png opened in Photoshop, right:screen grab of png from browser What gives?, does Photoshop not interpret the PNG file format correctly?

Read the article

Add Control Panel items to Windows 8.1 Search

- by Alec

Windows 8.1 claims to search in all indexed locations. However when I tried to open "Mail" (i.e. Mlcfg32.cpl) by typing "Mail" in Search, magic didn't happen. Instead I had to scramble all the way through Control Panel. Is it possible to add Control Panel items to those indexed locations? Or is it just the thing Microsoft forgot about? (e.g. forgot on intention, so users would make use of their new "Settings" applet).

Read the article

Seeking htaccess help: Converting multiple subdomains (both http and https) to www.domain.com using .htaccess

- by Joshua Dorkin

I've been trying to get an answer to this question on other forums (the folks at SuperUser thought this was the place I needed to post) and via my connections, but I haven't gotten very far. Hopefully you guys can help me find an answer: I've got a dozen old subdomains that have been indexed by Google. These have been indexed as both http AND https. I've managed to redirect all the subdomains properly, provided they are not https, but can't get any of the https subdomains to property redirect. Here's the code I'm using: RewriteCond %{HTTP_HOST} ^subdomain1.mysite.com$ [NC] RewriteRule ^(.*)$ http://www.mysite.com/$1 [R=301,L] RewriteCond %{HTTP_HOST} ^subdomain2.mysite.com$ [NC] RewriteRule ^(.*)$ http://www.mysite.com/$1 [R=301,L] RewriteCond %{HTTP_HOST} ^subdomain3.mysite.com$ [NC] RewriteRule ^(.*)$ http://www.mysite.com/$1 [R=301,L] This works great until someone goes to: https://subdomain2.mysite.com$ which is not redirected back to http://www.mysite.com$ How can I get this to work? Additionally, I'm guessing there is an easier way to make it happen than setting up a dozen pairs of Rewrite conditions/rewrite rule? Is there any way to do this in just a few lines, including one where I list all the subdomains? I'd actually also like to redirect everything on https://www.mysite.com$ to http://www.mysite.com$ except for 3 folders These are mysite.com/secure, mysite.com/store, mysite.com/user -- is there any good way to add this to the htaccess file? Any suggestions would be great! Thank you in advance for any help.

Read the article

Google indexing and ranking a custom domain served by Google App Engine

- by Hugues

I have a website served on the following URL : "http://www.plugimmo.com" which is a custom domain served by Google App Engine on the following URL : http://plugimmo.appspot.com Since a while I have tried to optimise the Google indexing and ranking with no success. The problem is that searching on Google the keywords in the title of my home page does not retrieve my website at all even not in the 1,000 first results : When checking the cached version of google ( cache:www.plugimmo.com), it says that the cached version is the one of 20-Aug-12 of "plugimmo.appspot.com". It looks there are several issues : 1 - The cached version is really old. I have made a lot of changes since the 20-Aug-12 and I saw the googlebot crawling my site several times. 2 - The cached version is for "plugimmo.appspot.com" 3 - When looking at the Google Webmaster tools, I see that the number of pages indexed for www.plugimmo.com is 0, but that can't be the case given the number of changes I made since then. My questions would therefore be the following : Why is the version of the cache so old although I saw the googlebot crawling the site many times since 20-Aug-12 ? Is there a problem with indexing a custom domain served by Google App Engine ? Why is the Google Webmaster tools showing 0 pages indexed although new pages have been crawled and that no errors have been reported in the indexing ? Also, the site has been developed with Google Web Toolkit. I have followed the guidelines regarding crawling Ajax sites. The home page when crawled by a robot is redirected to http://www.plugimmo.com/HomeSnapshot.html Thanks a lot for your help ! Hugues

Read the article

Auto-Suggest via ‘Trie’ (Pre-fix Tree)

- by Strenium

Auto-Suggest (Auto-Complete) “thing” has been around for a few years. Here’s my little snippet on the subject. For one of my projects, I had to deal with a non-trivial set of items to be pulled via auto-suggest used by multiple concurrent users. Simple, dumb iteration through a list in local cache or back-end access didn’t quite cut it. Enter a nifty little structure, perfectly suited for storing and matching verbal data: “Trie” (http://tinyurl.com/db56g) also known as a Pre-fix Tree: “Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree defines the key with which it is associated. All the descendants of a node have a common prefix of the string associated with that node, and the root is associated with the empty string. Values are normally not associated with every node, only with leaves and some inner nodes that correspond to keys of interest.” This is a very scalable, performing structure. Though, as usual, something ‘fast’ comes at a cost of ‘size’; fortunately RAM is more plentiful today so I can live with that. I won’t bore you with the detailed algorithmic performance here - Google can do a better job of such. So, here’s C# implementation of all this. Let’s start with individual node: Trie Node /// <summary> /// Contains datum of a single trie node. /// </summary> public class AutoSuggestTrieNode { public char Value { get; set; } /// <summary> /// Gets a value indicating whether this instance is leaf node. /// </summary> /// <value> /// <c>true</c> if this instance is leaf node; otherwise, a prefix node <c>false</c>. /// </value> public bool IsLeafNode { get; private set; } public List<AutoSuggestTrieNode> DescendantNodes { get; private set; } /// <summary> /// Initializes a new instance of the <see cref="AutoSuggestTrieNode"/> class. /// </summary> /// <param name="value">The phonetic value.</param> /// <param name="isLeafNode">if set to <c>true</c> [is leaf node].</param> public AutoSuggestTrieNode(char value = ' ', bool isLeafNode = false) { Value = value; IsLeafNode = isLeafNode; DescendantNodes = new List<AutoSuggestTrieNode>(); } /// <summary> /// Gets the descendants of the pre-fix node, if any. /// </summary> /// <param name="descendantValue">The descendant value.</param> /// <returns></returns> public AutoSuggestTrieNode GetDescendant(char descendantValue) { return DescendantNodes.FirstOrDefault(descendant => descendant.Value == descendantValue); } } Quite self-explanatory, imho. A node is either a “Pre-fix” or a “Leaf” node. “Leaf” contains the full “word”, while the “Pre-fix” nodes act as indices used for matching the results. Ok, now the Trie: Trie Structure /// <summary> /// Contains structure and functionality of an AutoSuggest Trie (Pre-fix Tree) /// </summary> public class AutoSuggestTrie { private readonly AutoSuggestTrieNode _root = new AutoSuggestTrieNode(); /// <summary> /// Adds the word to the trie by breaking it up to pre-fix nodes + leaf node. /// </summary> /// <param name="word">Phonetic value.</param> public void AddWord(string word) { var currentNode = _root; word = word.Trim().ToLower(); for (int i = 0; i < word.Length; i++) { var child = currentNode.GetDescendant(word[i]); if (child == null) /* this character hasn't yet been indexed in the trie */ { var newNode = new AutoSuggestTrieNode(word[i], word.Count() - 1 == i); currentNode.DescendantNodes.Add(newNode); currentNode = newNode; } else currentNode = child; /* this character is already indexed, move down the trie */ } } /// <summary> /// Gets the suggested matches. /// </summary> /// <param name="word">The phonetic search value.</param> /// <returns></returns> public List<string> GetSuggestedMatches(string word) { var currentNode = _root; word = word.Trim().ToLower(); var indexedNodesValues = new StringBuilder(); var resultBag = new ConcurrentBag<string>(); for (int i = 0; i < word.Trim().Length; i++) /* traverse the trie collecting closest indexed parent (parent can't be leaf, obviously) */ { var child = currentNode.GetDescendant(word[i]); if (child == null || word.Count() - 1 == i) break; /* done looking, the rest of the characters aren't indexed in the trie */ indexedNodesValues.Append(word[i]); currentNode = child; } Action<AutoSuggestTrieNode, string> collectAllMatches = null; collectAllMatches = (node, aggregatedValue) => /* traverse the trie collecting matching leafNodes (i.e. "full words") */ { if (node.IsLeafNode) /* full word */ resultBag.Add(aggregatedValue); /* thread-safe write */ Parallel.ForEach(node.DescendantNodes, descendandNode => /* asynchronous recursive traversal */ { collectAllMatches(descendandNode, String.Format("{0}{1}", aggregatedValue, descendandNode.Value)); }); }; collectAllMatches(currentNode, indexedNodesValues.ToString()); return resultBag.OrderBy(o => o).ToList(); } /// <summary> /// Gets the total words (leafs) in the trie. Recursive traversal. /// </summary> public int TotalWords { get { int runningCount = 0; Action<AutoSuggestTrieNode> traverseAllDecendants = null; traverseAllDecendants = n => { runningCount += n.DescendantNodes.Count(o => o.IsLeafNode); n.DescendantNodes.ForEach(traverseAllDecendants); }; traverseAllDecendants(this._root); return runningCount; } } } Matching operations and Inserts involve traversing the nodes before the right “spot” is found. Inserts need be synchronous since ordering of data matters here. However, matching can be done in parallel traversal using recursion (line 64). Here’s sample usage: [TestMethod] public void AutoSuggestTest() { var autoSuggestCache = new AutoSuggestTrie(); var testInput = @"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer nec odio. Praesent libero. Sed cursus ante dapibus diam. Sed nisi. Nulla quis sem at nibh elementum imperdiet. Duis sagittis ipsum. Praesent mauris. Fusce nec tellus sed augue semper porta. Mauris massa. Vestibulum lacinia arcu eget nulla. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Curabitur sodales ligula in libero. Sed dignissim lacinia nunc. Curabitur tortor. Pellentesque nibh. Aenean quam. In scelerisque sem at dolor. Maecenas mattis. Sed convallis tristique sem. Proin ut ligula vel nunc egestas porttitor. Morbi lectus risus, iaculis vel, suscipit quis, luctus non, massa. Fusce ac turpis quis ligula lacinia aliquet. Mauris ipsum. Nulla metus metus, ullamcorper vel, tincidunt sed, euismod in, nibh. Quisque volutpat condimentum velit. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Nam nec ante. Sed lacinia, urna non tincidunt mattis, tortor neque adipiscing diam, a cursus ipsum ante quis turpis. Nulla facilisi. Ut fringilla. Suspendisse potenti. Nunc feugiat mi a tellus consequat imperdiet. Vestibulum sapien. Proin quam. Etiam ultrices. Suspendisse in justo eu magna luctus suscipit. Sed lectus. Integer euismod lacus luctus magna. Quisque cursus, metus vitae pharetra auctor, sem massa mattis sem, at interdum magna augue eget diam. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Morbi lacinia molestie dui. Praesent blandit dolor. Sed non quam. In vel mi sit amet augue congue elementum. Morbi in ipsum sit amet pede facilisis laoreet. Donec lacus nunc, viverra nec."; testInput.Split(' ').ToList().ForEach(word => autoSuggestCache.AddWord(word)); var testMatches = autoSuggestCache.GetSuggestedMatches("le"); } ..and the result: That’s it!

Read the article

SQL SERVER – Weekly Series – Memory Lane – #049

- by Pinal Dave

Here is the list of selected articles of SQLAuthority.com across all these years. Instead of just listing all the articles I have selected a few of my most favorite articles and have listed them here with additional notes below it. Let me know which one of the following is your favorite article from memory lane. 2007 Two Connections Related Global Variables Explained – @@CONNECTIONS and @@MAX_CONNECTIONS @@CONNECTIONS Returns the number of attempted connections, either successful or unsuccessful since SQL Server was last started. @@MAX_CONNECTIONS Returns the maximum number of simultaneous user connections allowed on an instance of SQL Server. The number returned is not necessarily the number currently configured. Query Editor – Microsoft SQL Server Management Studio This post may be very simple for most of the users of SQL Server 2005. Earlier this year, I have received one question many times – Where is Query Analyzer in SQL Server 2005? I wrote small post about it and pointed many users to that post – SQL SERVER – 2005 Query Analyzer – Microsoft SQL SERVER Management Studio. Recently I have been receiving similar question. OUTPUT Clause Example and Explanation with INSERT, UPDATE, DELETE SQL Server 2005 has a new OUTPUT clause, which is quite useful. OUTPUT clause has access to insert and deleted tables (virtual tables) just like triggers. OUTPUT clause can be used to return values to client clause. OUTPUT clause can be used with INSERT, UPDATE, or DELETE to identify the actual rows affected by these statements. OUTPUT clause can generate a table variable, a permanent table, or temporary table. Even though, @@Identity will still work with SQL Server 2005, however I find the OUTPUT clause very easy and powerful to use. Let us understand the OUTPUT clause using an example. Find Name of The SQL Server Instance Based on database server stored procedures has to run different logic. We came up with two different solutions. 1) When database schema is very much changed, we wrote completely new stored procedure and deprecated older version once it was not needed. 2) When logic depended on Server Name we used global variable @@SERVERNAME. It was very convenient while writing migrating script which depended on the server name for the same database. Explanation of TRY…CATCH and ERROR Handling With RAISEERROR Function One of the developers at my company thought that we can not use the RAISEERROR function in new feature of SQL Server 2005 TRY… CATCH. When asked for an explanation he suggested SQL SERVER – 2005 Explanation of TRY… CATCH and ERROR Handling article as excuse suggesting that I did not give example of RAISEERROR with TRY…CATCH. We all thought it was funny. Just to keep records straight, TRY… CATCH can sure use RAISEERROR function. Different Types of Cache Objects Serveral kinds of objects can be stored in the procedure cache: Compiled Plans: When the query optimizer finishes compiling a query plan, the principal output is compiled plan. Execution contexts: While executing a compiled plan, SQL Server has to keep track of information about the state of execution. Cursors: Cursors track the execution state of server-side cursors, including the cursor’s current location within a resultset. Algebrizer trees: The Algebrizer’s job is to produce an algebrizer tree, which represents the logic structure of a query. Open SSMS From Command Prompt – sqlwb.exe Example This article is written by request and suggestion of Sr. Web Developer at my organization. Due to the nature of this article most of the content is referred from Book On-Line. sqlwbcommand prompt utility which opens SQL Server Management Studio. Squib command does not run queries from the command prompt. sqlcmd utility runs queries from command prompt, read for more information. 2008 Puzzle – Solution – Computed Columns Datatype Explanation Just a day before I wrote article SQL SERVER – Puzzle – Computed Columns Datatype Explanation which was inspired by SQL Server MVP Jacob Sebastian. I suggest that before continuing this article read the original puzzle question SQL SERVER – Puzzle – Computed Columns Datatype Explanation.The question was if the computed column was of datatype TINYINT how to create a Computed Column of datatype INT? 2008 – Find If Index is Being Used in Database It is very often I get a query that how to find if any index is being used in the database or not. If any database has many indexes and not all indexes are used it can adversely affect performance. If the number of indices are higher it reduces the INSERT / UPDATE / DELETE operation but increase the SELECT operation. It is recommended to drop any unused indexes from table to improve the performance. 2009 Interesting Observation – Execution Plan and Results of Aggregate Concatenation Queries If you want to see what’s going on here, I think you need to shift your point of view from an implementation-centric view to an ANSI point of view. ANSI does not guarantee processing the order. Figure 2 is interesting, but it will be potentially misleading if you don’t understand the ANSI rule-set SQL Server operates under in most cases. Implementation thinking can certainly be useful at times when you really need that multi-million row query to finish before the backup fire off, but in this case, it’s counterproductive to understanding what is going on. SQL Server Management Studio and Client Statistics Client Statistics are very important. Many a times, people relate queries execution plan to query cost. This is not a good comparison. Both parameters are different, and they are not always related. It is possible that the query cost of any statement is less, but the amount of the data returned is considerably larger, which is causing any query to run slow. How do we know if any query is retrieving a large amount data or very little data? 2010 I encourage all of you to go through complete series and write your own on the subject. If you write an article and send it to me, I will publish it on this blog with due credit to you. If you write on your own blog, I will update this blog post pointing to your blog post. SQL SERVER – ORDER BY Does Not Work – Limitation of the View 1 SQL SERVER – Adding Column is Expensive by Joining Table Outside View – Limitation of the View 2 SQL SERVER – Index Created on View not Used Often – Limitation of the View 3 SQL SERVER – SELECT * and Adding Column Issue in View – Limitation of the View 4 SQL SERVER – COUNT(*) Not Allowed but COUNT_BIG(*) Allowed – Limitation of the View 5 SQL SERVER – UNION Not Allowed but OR Allowed in Index View – Limitation of the View 6 SQL SERVER – Cross Database Queries Not Allowed in Indexed View – Limitation of the View 7 SQL SERVER – Outer Join Not Allowed in Indexed Views – Limitation of the View 8 SQL SERVER – SELF JOIN Not Allowed in Indexed View – Limitation of the View 9 SQL SERVER – Keywords View Definition Must Not Contain for Indexed View – Limitation of the View 10 SQL SERVER – View Over the View Not Possible with Index View – Limitations of the View 11 SQL SERVER – Get Query Running in Session I was recently looking for syntax where I needed a query running in any particular session. I always remembered the syntax and ha d actually written it down before, but somehow it was not coming to mind quickly this time. I searched online and I ended up on my own article written last year SQL SERVER – Get Last Running Query Based on SPID. I felt that I am getting old because I forgot this really simple syntax. Find Total Number of Transaction on Interval In one of my recent Performance Tuning assignments I was asked how do someone know how many transactions are happening on a server during certain interval. I had a handy script for the same. Following script displays transactions happened on the server at the interval of one minute. You can change the WAITFOR DELAY to any other interval and it should work. 2011 Here are two DMV’s which are newly introduced in SQL Server 2012 and provides vital information about SQL Server. DMV – sys.dm_os_volume_stats – Information about operating system volume DMV – sys.dm_os_windows_info – Information about Operating System SQL Backup and FTP – A Quick and Handy Tool I have used this tool extensively since 2009 at numerous occasion and found it to be very impressive. What separates it from the crowd the most – it is it’s apparent simplicity and speed. When I install SQLBackupAndFTP and configure backups – all in 1 or 2 minutes, my clients are always impressed. Quick Note about JOIN – Common Questions and Simple Answers In this blog post we are going to talk about join and lots of things related to the JOIN. I recently started office hours to answer questions and issues of the community. I receive so many questions that are related to JOIN. I will share a few of the same over here. Most of them are basic, but note that the basics are of great importance. 2012 Importance of User Without Login Question: “In recent version of SQL Server we can create user without login. What is the use of it?” Great question indeed. Let me first attempt to answer this question but after reading my answer I need your help. I want you to help him as well with adding more value to it. Preserve Leading Zero While Coping to Excel from SSMS Earlier I wrote two articles about how to efficiently copy data from SSMS to Excel. Since I wrote that post there are plenty of interest generated on this subject. There are a few questions I keep on getting over this subject. One of the question is how to get the leading zero preserved while copying the data from SSMS to Excel. Well it is almost the same way as my earlier post SQL SERVER – Excel Losing Decimal Values When Value Pasted from SSMS ResultSet. The key here is in EXCEL and not in SQL Server. Solution – 2 T-SQL Puzzles – Display Star and Shortest Code to Display 1 Earlier on this blog we had asked two puzzles. The response from all of you is nothing but Amazing. I have received 350+ responses. Many are valid and many were indeed something I had not thought about it. I strongly suggest you read all the puzzles and their answers here - trust me if you start reading the comments you will not stop till you read every single comment. Seriously trust me on it. Personally I have learned a lot from it. Identify Most Resource Intensive Queries – SQL in Sixty Seconds #028 – Video http://www.youtube.com/watch?v=TvlYy-TGaaA Importance of User Without Login – T-SQL Demo Script Earlier I wrote a blog post about SQL SERVER – Importance of User Without Login and my friend and SQL Expert Vinod Kumar has written excellent follow up blog post about Contained Databases inside SQL Server 2012. Now lots of people asked me if I can also explain the same concept again so here is the small demonstration for it. Let me show you how login without user can help. Before we continue on this subject I strongly recommend that you read my earlier blog post here. In following demo I am going to demonstrate following situation. Login using the System Admin account Create a user without login Checking Access Impersonate the user without login Checking Access Revert Impersonation Give Permission to user without login Impersonate the user without login Checking Access Revert Impersonation Clean up Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Memory Lane, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article

Lucene and .NET Part I

- by javarg

I’ve playing around with Lucene.NET and trying to get a feeling of what was required to develop and implement a full business application using it. As you would imagine, many things are required for you to implement a robust solution for indexing content and searching it afterwards. Lucene is a great and robust solution for indexing content. It offers fast and performance enhanced search engine library available in Java and .NET. You will want to use this library in many particular scenarios: In Windows Azure, to support Full Text Search (a functionality not currently supported by SQL Azure) When storing files outside or not managed by your database (like in large document storage solutions that uses File System) When Full Text Search is not really what you need Lucene is more than a Full Text Search solution. It has several analyzers that let you process and search content in different ways (decomposing sentences, deriving words, removing articles, etc.). When deciding to implement indexing using Lucene, you will need to take into account the following: How content is to be indexed by Lucene and when. Using a service that runs after a specific interval Immediately when content changes When content is to available for searching / Availability of indexed content (as in real time content search) Immediately when content changes = near real time searching After a few minutes.. Ease of maintainability and development Some Technical Concerns.. When indexing content, indexes are locked for writing operations by the Index Writer. This means that Lucene is best designed to index content using single writer approach. When searching, Index Readers take a snapshot of indexes. This has the following implications: Setting up an index reader is a costly task. Your are not supposed to create one for each query or search. A good practice is to create readers and reuse them for several searches. The latter means that even when the content gets updated, you wont be able to see the changes. You will need to recycle the reader. In the second part of this post we will review some alternatives and design considerations.

Read the article

Multi language site - use of canonical link and link rel="alternate"

- by julia

I keep reading everywhere that if you have a multilanguage site, where the same page appears in, say, French and English, then this is considered as duplicate content by google. It is written that using canonical link is the solution, but I do not understand how to use it in this case. Should I: Choose either French URL or English URL to be the canonical (main) one, and where I will place the canonical link? If so, how do I decide which of the two URLs must be canonical? both languages are important to me and I want the content under both languages to be indexed by google and served to the user, depending on the language in which he searches. OR should I place a canonical link on both French and English URLs? If so, then I do not understand the meaning of using the canonical link? In this case would both URLs be indexed, are both of them considered as "important" by google and not duplicates? Also I read that link rel="alternate" can be used to indicate to google that, for example the French URL is the French-language equivalent of the English page. This makes sense and I understand how to use such links, but how are they combined with canonical links? Should I define both the canonical URL AND specify rel="alternate" in both URLs? Could someone help me to clarify this, cause I'm stuck with this and can't seem to find a good-enough explanation in different sources.

Read the article

How to get search engines to properly index an ajax driven search page

- by Redtopia

I have an ajax-driven search page that will allow users to search through a large collection of records. Each search result points to index.php?id=xyz (where xyz is the id of the record). The initial view does not have any records listed, and there is no interface that allows you to browse through all records. You can only conduct a search. How do I build the page so that spiders can crawl each record? Or is there another way (outside of this specific search page) that will allow me to point spiders to a list of all records. FYI, the collection is rather large, so dumping links to every record in a single request is not a workable solution. Outputting the records must be done in multiple requests. Each record can be viewed via a single page (eg "record.php?id=xyz"). I would like all the records indexed without anything indexed from the sitemap that shows where the records exist, for example: <a href="/result.php?id=record1">Record 1</a> <a href="/result.php?id=record2">Record 2</a> <a href="/result.php?id=record3">Record 3</a> <a href="/seo.php?page=2">next</a> Assuming this is the correct approach, I have these questions: How would the search engines find the crawl page? Is it possible to prevent the search engines from indexing the words "Record 1", etc. and "next"? Can I output only the links? Or maybe something like:

Read the article

SharePoint 2007 / 2010 Content Indexing “The file reached the maximum download limit. Check that the full text of the document can be meaningfully crawled.”

- by Stacy Vicknair

If you have large files in a content source that is being indexed by Sharepoint you might run into the following error message: “The file reached the maximum download limit. Check that the full text of the document can be meaningfully crawled.” This is usually caused because SharePoint’s MaxDownloadSize setting is set lower than the size of the file you are attempting to index. You can increase this value, restart the service then kick off a full crawl in order to fix this issue, but SharePoint 2007 and 2010 have different methods for accomplishing this task. Sharepoint 2007 Open up the Registry editor and increase the MaxDownloadSize value to a number (in MB) higher than the largest file being indexed. You can find this at: HKEY_LOCAL_MACHINE\Software\Microsoft\Search\1.0\Gathering Manager After you increase the size, cycle the search service and kick off a full crawl of the content source in question. Sharepoint 2010 With SharePoint 2010 you can use PowerShell via the Sharepoint 2010 Console in order to change the MaxDownloadSize. Execute the following commands to update the value: 1: $ssa = Get-SPEnterpriseSearchServiceApplication 2: $ssa.SetProperty(“MaxDownloadSize”, <new size in MB>) 3: $ssa.Update() References: http://support.microsoft.com/kb/287231 http://blogs.technet.com/b/brent/archive/2010/07/19/sharepoint-server-2010-maxdownloadsize-and-maxgrowfactor.aspx Technorati Tags: SharePoint,WSS,MaxDownloadSize,Search

Read the article

Does Google submit HTML forms?

- by Saeed Neamati

I have a web page, say http://domain/purchase and in this page, I have a web form. User, on submitting this form (which has validation, both client-side and server side and won't be validated until fields are filled appropriately), would be redirected to another page, where (s)he can choose other things, and specify other settings and then purchase our product. Say the second page is http://domain/options. So, user comes to our site and visits http://domain/purchase, fills the form, submits it, and then would be redirected to the second page, http://doamin/options?parameter1=value1&parameter2=value2, which contains parameters from the first page. This is very common in passing parameters between web pages (or technically, between URLs). Now I was reviewing my website, and saw that Google had indexed some of my redirected web pages and URLs, like: http://domain/options?parameter1=value1&parameter2=value2 http://domain/options?parameter1=value3&parameter2=value4 http://domain/options?parameter1=value5&parameter2=value6 http://domain/options?parameter1=value7&parameter2=value8 http://domain/options?parameter1=value9&parameter2=value10 This means that Google Bot has visited our http://domain/purchase page, and has filled our form, and has submitted it, and was being redirected to the other URL, with corresponding parameters. This is the only way that makes sense to me. Does Google really fills forms? PS: All parameters are meaningful, meaning that they are not filled arbitrarily. For example, the phone parameter in indexed pages has correct phone numbers. How is it possible?

Read the article

Seeking .htaccess help: Converting multiple subdomains (both HTTP and HTTPS) to www.domain.com using .htaccess

- by Joshua Dorkin

I've been trying to get an answer to this question on other forums (the folks at SuperUser thought this was the place I needed to post) and via my connections, but I haven't gotten very far. Hopefully you guys can help me find an answer. I've got a dozen old subdomains that have been indexed by Google. These have been indexed as both HTTP AND HTTPS. I've managed to redirect all the subdomains properly, provided they are not HTTPS, but can't get any of the HTTPS subdomains to property redirect. Here's the code I'm using: RewriteCond %{HTTP_HOST} ^subdomain1.mysite.com$ [NC] RewriteRule ^(.*)$ http://www.mysite.com/$1 [R=301,L] RewriteCond %{HTTP_HOST} ^subdomain2.mysite.com$ [NC] RewriteRule ^(.*)$ http://www.mysite.com/$1 [R=301,L] RewriteCond %{HTTP_HOST} ^subdomain3.mysite.com$ [NC] RewriteRule ^(.*)$ http://www.mysite.com/$1 [R=301,L] This works great until someone goes to: https://subdomain2.mysite.com$ which is not redirected back to http://www.mysite.com$ How can I get this to work? Additionally, I'm guessing there is an easier way to make it happen than setting up a dozen pairs of RewriteCond/RewriteRule? Is there any way to do this in just a few lines, including one where I list all the subdomains? I'd actually also like to redirect everything on https://www.mysite.com$ to http://www.mysite.com$ except for 3 folders. These are mysite.com/secure, mysite.com/store, mysite.com/user. Is there any good way to add this to the .htaccess file?

Read the article

Reset / Remove - Google Keywords

- by Herr Kaleun

Summary: My site is ranking for filthy keywords and i would like to remove them from google ranking/keywords. Background: My server was hacked using the timthumb exploit/security vulnerability, apparently i was the last person on earth to read the news about the exploit, several months after it appeared. Anyway, the "hacker" was so friendly to modify the index.php file in such a fashion, that it generated random sexual oriented keywords if the website is fetched as google-bot. So if you would fetch it as google bot/it gets indexed, you would get randomly generated keywords like: sex videos teenager teen sex adult sex preteen A LINK TO A RANDOM CONTENT OF MY WEBPAGE anime sex videos a rough list something similar to that, about 180-200 per page. I've discovered it far too late, so that google had me indexed for the words "sex" and certain adult oriented keywords, about roughly 2000. I've removed all the content, toke the site down, replaced the index.php with a static HTML and added a "ERROR 410" title to the website so that the content is no longer here and removed permanently. I've also applied for a manual review of my website, about 1.5 months ago but still, the keywords are there, and very strange, some of the keyword rankings actually "improve" over time. Here are some screenshots from webmasters tools: Question: How can i remove this filthy keywords and re-rank my website as a "normal" website on the fastest way? I want to "REMOVE" the keywords if possible. Please help me or point me into a direction. Thank you

Read the article

Daily Blog Archives and Duplicate Content

- by nemmy

A few weeks back I realised that my blog software was creating daily post archives. Which basically resulted in duplicate content especially if I only had one post a day. The situation is something like this: www.sitename.com/blog/archives/2013/06/01 - daily archive for 1 June 2013 www.sitename.com/blog/archives/2013/06/my-post-name.html So, here we have two pages that are basically identical except the daily archive has some meaningless title like "Daily Archive for 1 June 2003". And I have no control over which content Google decides is the primary content. It's quite possible (and likely) that the daily archive could be the "primary" content and the actual post itself the "duplicate". Once I realised it was doing this I modified the daily archive template to include <meta name="robots" content="noindex"> Here we are a few weeks later and I still see some daily archives coming up in Google search results. I realise some of those deep pages might not be crawled yet but I am worried that the original post (which should be the PRIMARY content) has been marked duplicate content by Google. Now I've no indexed the daily archives I might end up with no indexed content AND the original articles still flagged as duplicates. And nothing will show up in search at all. Have I screwed myself here or is there a way out?

Read the article

How to 301 redirect from old query string urls to CakePHP Canonical urls?

- by Daniel Bingham

I currently have a .htaccess file that looks like this: RewriteCond %{QUERY_STRING} ^action=view&item=([0-9]+)$ RewriteRule ^index\.php$ /index.php?url=item/%1 [R=301] RewriteCond %{REQUEST_FILENAME} !-d RewriteCond %{REQUEST_FILENAME} !-f RewriteRule ^(.*)$ index.php?url=$1 [QSA,L] It is meant to 301 redirect my old query string based URLs to new CakePHP urls. This will successfully send users to the correct page. However, Google doesn't seem to like it (see below). I previously tried doing this: RewriteCond %{QUERY_STRING} ^action=view&item=([0-9]+)$ RewriteRule ^index\.php$ /item/%1 [R=301] RewriteCond %{REQUEST_FILENAME} !-d RewriteCond %{REQUEST_FILENAME} !-f RewriteRule ^(.*)$ index.php?url=$1 [QSA,L] But that fails. The second rewrite rule doesn't seem to catch the rewritten URL. It goes straight through. Using the first version wouldn't be a problem, except that I suspect that is what is choking up Google. It hasn't indexed my sitemap full of the new URLs. My old sitemap had been fully indexed and all the URLs are in Google's index. But it isn't following the redirects from the old URLs to the new. I have a 'not followed' error for every one of the query urls that was in my old sitemap. Am I properly using a 301 redirect here? Is it the weird rewrite rule? What can I do to send both Google and users to the proper page and save my page rank?

Read the article

How to prevent duplication of content on a page with too many filters?

- by Vikas Gulati

I have a webpage where a user can search for items based on around 6 filters. Currently I have the page implemented with one filter as the base filter (part of the url that would get indexed) and other filters in the form of hash urls (which won't get indexed). This way the duplication is less. Something like this example.com/filter1value-items#by-filter3-filter3value-filter2-filter2value Now as you may see, only one filter is within the reach of the search engine while the rest are hashed. This way I could have 6 pages. Now the problem is I expect users to use two filters as well at times while searching. As per my analysis using the Google Keyword Analyzer there are a fare bit of users that might use two filters in conjunction while searching. So how should I go about it? Having all the filters as part of the url would simply explode the number of pages and sticking to the current way wouldn't let me target those users. I was thinking of going with at max 2 base filters and rest as part of the hash url. But the only thing stopping me is that it would lead to duplication of content as per Google Webmaster Tool's suggestions on Url Structure.

Read the article

Google ranking, page crawl

- by Nawaf Mubarak

please don't mind me for asking this newbie question about Google ranking. I know that in order to get ranked the page has to be crawled by Google bots, I have had a page example of which I will get a better understanding of how the system works with Google. I have made a page on my website last month, it got indexed pretty quickly, then I found that it's in Google's page 15 on my keyword as a start, next day it made it to page 13, then after a week it was jumping back and forth in page 17/18 up to 20. Now a month passed by, when and it isn't listed in any position of that 'keyword' sometimes I will find it in page 30, but later I won't find it anywhere, keep happening this way these days. Even if it isn't listed in any page for my keyword if I do a search for "site:thepageadress" it will be listed which means I'm not penalized and my page is there for google to see, but it isn't in the search result for my keyword. But when I write "site:thepage_adress" and I hit "search tools" option and click on "Past day" or "past week" it isn't listed, it is only listed when I click on "Past month" which I think means that Google indexed the page, looked at it once when I published it, and never looked at it again, is this a fair statement? So two questions that comes to mind here. 1- Should Google keep looking at a page even if I haven't changed any info for it? and is this an indication for me that my page is doing fine? or is it normal that Google see's it once and thats it? 2- Why and how to fix the fact that my page keeps jumping back and forth in the ranking result for keyword, and sometimes it isn't even listed, what does that mean? Sorry for the long msg, I hope to god that somebody help me with this. Thank you!

Read the article

Algorithmic problem - quickly finding all #'s where value %x is some given value

- by Steve B.

Problem I'm trying to solve, apologies in advance for the length: Given a large number of stored records, each with a unique (String) field S. I'd like to be able to find through an indexed query all records where Hash(S) % N == K for any arbitrary N, K (e.g. given a million strings, find all strings where HashCode(s) % 17 = 5. Is there some way of memoizing this so that we can quickly answer any question of this form without doing the % on every value? The motivation for this is a system of N distributed nodes, where each record has to be assigned to at least one node. The nodes are numbered 0 - (K-1) , and each node has to load up all of the records that match it's number: If we have 3 nodes Node 0 loads all records where Hash % 3 ==0 Node 1 loads all records where Hash % 3 ==1 Node 2 loads all records where Hash % 3 ==2 adding a 4th node, obviously all the assignments have to be recomputed - Node 0 loads all records where Hash % 4 ==0 ... etc I'd like to easily find these records through an indexed query without having to compute the mod individually. The best I've been able to come up with so far: If we take the prime factors of N (p1 * p2 * ... ) if N % M == I then p % M == I % p for all of N's prime factors e.g. 10 nodes : N % 10 == 6 then N % 2 = 0 == 6 %2 N % 5 = 1 == 6 %5 so storing an array of the "%" of N for the first "reasonable" number of primes for my data set should be helpful. For example in the above example we store the hash and the primes HASH PRIMES (array of %2, %3, %5, %7, ... ]) 16 [0 1 1 2 .. ] so looking for N%10 == 6 is equivalent to looking for all values where array[1]==1 and array[2] == 1. However, this breaks at the first prime larger than the highest number I'm storing in the factor table. Is there a better way?

Read the article

Searching for entity awareness in 3D space algorithm and data structure

- by Khanser

I'm trying to do some huge AI system just for the fun and I've come to this problem. How can I let the AI entities know about each other without getting the CPU to perform redundant and costly work? Every entity has a spatial awareness zone and it has to know what's inside when it has to decide what to do. First thoughts, for every entity test if the other entities are inside the first's reach. Ok, so it was the first try and yep, that is redundant and costly. We are working with real time AI over 10000+ entities so this is not a solution. Second try, calculate some grid over the awareness zone of every entity and test whether in this zones are entities (we are working with 3D entities with float x,y,z location coordinates) testing every point in the grid with the indexed-by-coordinate entities. Well, I don't like this because is also costly, but not as the first one. Third, create some multi linked lists over the x's and y's indexed positions of the entities so when we search for an interval between x,y and z,w positions (this interval defines the square over the spatial awareness zone) over the multi linked list, we won't have 'voids'. This has the problem of finding the nearest proximity value if there isn't one at the position where we start the search. I'm not convinced with any of the ideas so I'm searching for some enlightening. Do you people have any better ideas?

Read the article

Is this a ridiculous way to structure a DB schema, or am I completely missing something?

- by Jim

I have done a fair bit of work with relational databases, and think I understand the basic concepts of good schema design pretty well. I recently was tasked with taking over a project where the DB was designed by a highly-paid consultant. Please let me know if my gut intinct - "WTF??!?" - is warranted, or is this guy such a genius that he's operating out of my realm? DB in question is an in-house app used to enter requests from employees. Just looking at a small section of it, you have information on the users, and information on the request being made. I would design this like so: User table: UserID (primary Key, indexed, no dupes) FirstName LastName Department Request table RequestID (primary Key, indexed, no dupes) <...> various data fields containing request details UserID -- foreign key associated with User table Simple, right? Consultant designed it like this (with sample data): UsersTable UserID FirstName LastName 234 John Doe 516 Jane Doe 123 Foo Bar DepartmentsTable DepartmentID Name 1 Sales 2 HR 3 IT UserDepartmentTable UserDepartmentID UserID Department 1 234 2 2 516 2 3 123 1 RequestTable RequestID UserID <...> 1 516 blah 2 516 blah 3 234 blah The entire database is constructed like this, with every piece of data encapsulated in its own table, with numeric IDs linking everything together. Apparently the consultant had read about OLAP and wanted the 'speed of integer lookups' He also has a large number of stored procedures to cross reference all of these tables. Is this valid design for a small to mid-sized SQL DB? Thanks for comments/answers...

Read the article

How do sites avoid SEO issues / legalities with subdomain unique ids?

- by JM4

I was looking through a few websites recently and noticed a trend I'm not sure I understand. Sites are creating unique referral URLs for customers in the form of: http://customname.site.com (If somebody were to use http://www.site.com/customname it would function the same way). I can see the sites are using 302 redirects at some point using Google Chrome then doing some sort of htaccess redirect, taking the subdomain name (customname) and applying it as a referral parameter then keeping in session during the entire process. However, there must be thousands of these custom URLs that people are typing in. How are each one of these "subdomains" not treated as separate URLs which in turn are redirected to the same page (in short, generating tons of links all pointing to the same page which Google would normally frown upon)? Additionally, the links also appear on the site themselves as clickable links so I'm not sure how these are not tracked. Similarly, the "unique" url is not indexed or cached in any Google search results. How is this capability handled? It does NOT highlight the referral aspect, but a true example of this is visiting http://sfgiants.com which does a 302 redirect to the much longer proper San Francisco Giants MLB homepage. I am wondering how SFgiants.com is not indexed (assuming that direct shortened link appears on several MLB pages)? 1 - I know these are 302 redirects, I can see this on the sites network flow. 2 - These links do in fact appear on the page itself because in some areas (for example, the bottom of the page may say: send this page to a friend! http://name.site.com/ which in turn would again redirect to something like http://www.site.com?id=name so the id value could be stored in session

Read the article

Request Removal of naked domain from Google Index

- by Pedr

I have a site which was temporarily available at both example.com and www.example.com. All traffic to example.com is now redirected to www.example.com, however during the brief period that the site was available at the naked domain, Google indexed it. So Google now has two versions of every page indexed: www.example.com www.example.com/about_us www.example.com/products/something ... and example.com example.com/about_us example.com/products/something ... For obvious reasons, this is a bad situation, so how can I best resolve it? Should I request removal of these pages from the index? There is still content at these URLs, but they now redirect to the www subdomain equivalent. The site has many hundreds of pages, but the only way I can see to request removal is via the Remove outdated content screen in Webmaster Tools, one URL at a time. How can I request removal of an entire domain (ie. the naked domain) without it effecting the true site located at the www subdomain? Is this the correct strategy given that all the naked domains now redirect to their www equivalent?

Read the article

Working with Google Webmaster Tools

- by com

My first question is about Crawl errors in Google Webmaster Tools. Crawl errors is devided into few sections. One of them is HTTP. I assume that all broken links in HTTP was somehow found by crawler, this is not the links from sitemap. If this was found by scanning all sitemap pages for links, why it doesn't mention what was the source page, like in sitemap section with column Linked From. And what the meaning of Linked From, I thought if the name of section is sitemap, therefore all URLs should be taken from sitemap, so why there is Linked From? The second question, what is the best way to trreat searching on the site. How come the searching result page are getting indexed? Because of the fact that all searching result page are getting indexed, I have to many page in Linked From. What's the right practice? Question three: In order to improve response time in WMT, can I redirect all crawler's requests to designated free web server? Is this good practice? Question four: How should I treat Google Analytics Code (with parameters PageView, PageLoadTime), in the case user request non existing page, should I render Google code or not? Right now I use Google Analytics Code on the common template page, such that every page, also non existing page with error message contains Google Analytics Code, it seems like it has influence on WMT.

Search Results

Search found 991 results on 40 pages for 'indexed'.

Page 11/40 | < Previous Page | 7 8 9 10 11 12 13 14 15 16 17 18 | Next Page >

- by SkippyFire

- by DavidStein

- by Daniel Beardsley

- by Alec

- by Joshua Dorkin

- by Hugues

- by Strenium

- by Pinal Dave

- by javarg

- by julia

- by Redtopia

- by Stacy Vicknair

- by Saeed Neamati

- by Joshua Dorkin

- by Herr Kaleun

- by nemmy

- by Daniel Bingham

- by Vikas Gulati

- by Nawaf Mubarak

- by Steve B.

- by Khanser

- by Jim

- by JM4

- by Pedr

- by com

< Previous Page | 7 8 9 10 11 12 13 14 15 16 17 18 | Next Page >