You’re probably most familiar with the terms ‘Seek’ and ‘Scan’
from the graphical plans produced by SQL Server Management Studio (SSMS). The image
to the left shows the most common ones, with the three types of scan at the top, followed by four types of seek. You might look
to the SSMS tool-tip descriptions
to explain the differences between them: Not hugely helpful are they? Both mention scans and ranges (nothing about seeks) and the Index Seek description implies that it will not scan the index entirely (which isn’t necessarily true). Recall also yesterday’s post where we saw two Clustered Index Seek operations doing very different things. The first Seek performed 63 single-row seeking operations; and the second performed a ‘Range Scan’ (more on those later in this post). I hope you agree that those were two very different operations, and perhaps you are wondering why there aren’t different graphical plan icons for Range Scans and Seeks? I have often wondered about that, and the first person
to mention it after yesterday’s post was Erin Stellato (twitter | blog): Before we go on
to make sense of all this, let’s look at another example of how SQL Server confusingly mixes the terms ‘Scan’ and ‘Seek’ in different contexts. The diagram below shows a very simple heap table with two columns, one of which is the non-clustered Primary Key, and the other has a non-unique non-clustered index defined on it. The right hand side of the diagram shows a simple query, it’s associated query plan, and a couple of extracts
from the SSMS tool-tip and Properties windows. Notice the ‘scan direction’ entry in the Properties window snippet. Is this a seek or a scan? The different references
to Scans and Seeks are even more pronounced in the XML plan output that the graphical plan is based on. This fragment is what lies behind the single Index Seek icon shown above: You’ll find the same confusing references
to Seeks and Scans throughout the product and its documentation. Making Sense of Seeks Let’s forget all about scans for a moment, and think purely about seeks. Loosely speaking, a seek is the process of navigating an index B-tree
to find a particular index record, most often at the leaf level. A seek starts at the root and navigates down through the levels of the index
to find the point of interest: Singleton Lookups The simplest sort of seek predicate performs this traversal
to find (at most) a single record. This is the case when we search for a single value using a unique index and an equality predicate. It should be readily apparent that this type of search will either find one record, or none at all. This operation is known as a singleton lookup. Given the example table
from before, the following query is an example of a singleton lookup seek: Sadly, there’s nothing in the graphical plan or XML output
to show that this is a singleton lookup – you have
to infer it
from the fact that this is a single-value equality seek on a unique index. The other common examples of a singleton lookup are bookmark lookups – both the RID and Key Lookup forms are singleton lookups (an RID lookup finds a single record in a heap
from the unique row locator, and a Key Lookup does much the same thing on a clustered table). If you happen
to run your query with STATISTICS IO ON, you will notice that ‘Scan Count’ is always
zero for a singleton lookup. Range Scans The other type of seek predicate is a ‘seek plus range scan’, which I will refer
to simply as a range scan. The seek operation makes an initial descent into the index structure
to find the first leaf row that qualifies, and then performs a range scan (either backwards or forwards in the index) until it reaches the end of the scan range. The ability of a range scan
to proceed in either direction comes about because index pages at the same level are connected by a doubly-linked list – each page has a pointer
to the previous page (in logical key order) as well as a pointer
to the following page. The doubly-linked list is represented by the green and red dotted arrows in the index diagram presented earlier. One subtle (but important) point is that the notion of a ‘forward’ or ‘backward’ scan applies
to the logical key order defined when the index was built. In the present case, the non-clustered primary key index was created as follows: CREATE TABLE
dbo.Example
(
key_col INTEGER NOT NULL,
data INTEGER NOT NULL,
CONSTRAINT [PK dbo.Example key_col]
PRIMARY KEY NONCLUSTERED (key_col ASC)
)
;
Notice that the primary key index specifies an ascending sort order for the single key column. This means that a forward scan of the index will retrieve keys in ascending order, while a backward scan would retrieve keys in descending key order. If the index had been created instead on key_col DESC, a forward scan would retrieve keys in descending order, and a backward scan would return keys in ascending order.
A range scan seek predicate may have a Start condition, an End condition, or both. Where one is missing, the scan starts (or ends) at one extreme end of the index, depending on the scan direction. Some examples might help clarify that: the following diagram shows four queries, each of which performs a single seek against a column holding every integer
from 1
to 100 inclusive. The results
from each query are shown in the blue columns, and relevant attributes
from the Properties window appear on the right:
Query 1 specifies that all key_col values less than 5 should be returned in ascending order. The query plan achieves this by seeking
to the start of the index leaf (there is no explicit starting value) and scanning forward until the End condition (key_col < 5) is no longer satisfied (SQL Server knows it can stop looking as soon as it finds a key_col value that isn’t less than 5 because all later index entries are guaranteed
to sort higher).
Query 2 asks for key_col values greater than 95, in descending order. SQL Server returns these results by seeking
to the end of the index, and scanning backwards (in descending key order) until it comes across a row that isn’t greater than 95. Sharp-eyed readers may notice that the end-of-scan condition is shown as a Start range value. This is a bug in the XML show plan which bubbles up
to the Properties window – when a backward scan is performed, the roles of the Start and End values are reversed, but the plan does not reflect that. Oh well.
Query 3 looks for key_col values that are greater than or equal
to 10, and less than 15, in ascending order. This time, SQL Server seeks
to the first index record that matches the Start condition (key_col >= 10) and then scans forward through the leaf pages until the End condition (key_col < 15) is no longer met.
Query 4 performs much the same sort of operation as Query 3, but requests the output in descending order. Again, we have
to mentally reverse the Start and End conditions because of the bug, but otherwise the process is the same as always: SQL Server finds the highest-sorting record that meets the condition ‘key_col < 25’ and scans backward until ‘key_col >= 20’ is no longer true.
One final point
to note: seek operations always have the Ordered: True attribute. This means that the operator always produces rows in a sorted order, either ascending or descending depending on how the index was defined, and whether the scan part of the operation is forward or backward. You cannot rely on this sort order in your queries of course (you must always specify an ORDER BY clause if order is important) but SQL Server can make use of the sort order internally. In the four queries above, the query optimizer was able
to avoid an explicit Sort operator
to honour the ORDER BY clause, for example.
Multiple Seek Predicates
As we saw yesterday, a single index seek plan operator can contain one or more seek predicates. These seek predicates can either be all singleton seeks or all range scans – SQL Server does not mix them. For example, you might expect the following query
to contain two seek predicates, a singleton seek
to find the single record in the unique index where key_col = 10, and a range scan
to find the key_col values between 15 and 20:
SELECT key_col
FROM dbo.Example
WHERE key_col = 10
OR key_col BETWEEN 15 AND 20
ORDER BY key_col ASC
;
In fact, SQL Server transforms the singleton seek (key_col = 10)
to the equivalent range scan, Start:[key_col >= 10], End:[key_col <= 10]. This allows both range scans
to be evaluated by a single seek operator.
To be clear, this query results in two range scans: one
from 10
to 10, and one
from 15
to 20.
Final Thoughts
That’s it for today – tomorrow we’ll look at monitoring singleton lookups and range scans, and I’ll show you a seek on a heap table.
Yes, a seek. On a heap. Not an index!
If you would like
to run the queries in this post for yourself, there’s a script below. Thanks for reading!
IF OBJECT_ID(N'dbo.Example', N'U')
IS NOT NULL
BEGIN
DROP TABLE dbo.Example;
END
;
-- Test table is a heap
-- Non-clustered primary key on 'key_col'
CREATE TABLE
dbo.Example
(
key_col INTEGER NOT NULL,
data INTEGER NOT NULL,
CONSTRAINT [PK dbo.Example key_col]
PRIMARY KEY NONCLUSTERED (key_col)
)
;
-- Non-unique non-clustered index on the 'data' column
CREATE NONCLUSTERED INDEX
[IX dbo.Example data]
ON dbo.Example (data)
;
-- Add 100 rows
INSERT dbo.Example
WITH (TABLOCKX)
(
key_col,
data
)
SELECT key_col = V.number,
data = V.number
FROM master.dbo.spt_values AS V
WHERE V.[type] = N'P'
AND V.number BETWEEN 1 AND 100
;
-- ================
-- Singleton lookup
-- ================
;
-- Single value equality seek in a unique index
-- Scan count = 0 when STATISTIS IO is ON
-- Check the XML SHOWPLAN
SELECT E.key_col
FROM dbo.Example AS E
WHERE E.key_col = 32
;
-- ===========
-- Range Scans
-- ===========
;
-- Query 1
SELECT E.key_col
FROM dbo.Example AS E
WHERE E.key_col <= 5
ORDER BY
E.key_col ASC
;
-- Query 2
SELECT E.key_col
FROM dbo.Example AS E
WHERE E.key_col > 95
ORDER BY
E.key_col DESC
;
-- Query 3
SELECT E.key_col
FROM dbo.Example AS E
WHERE E.key_col >= 10
AND E.key_col < 15
ORDER BY
E.key_col ASC
;
-- Query 4
SELECT E.key_col
FROM dbo.Example AS E
WHERE E.key_col >= 20
AND E.key_col < 25
ORDER BY
E.key_col DESC
;
-- Final query (singleton + range = 2 range scans)
SELECT E.key_col
FROM dbo.Example AS E
WHERE E.key_col = 10
OR E.key_col BETWEEN 15 AND 20
ORDER BY
E.key_col ASC
;
-- === TIDY UP ===
DROP TABLE dbo.Example;
© 2011 Paul White
email:
[email protected]
twitter: @SQL_Kiwi