You’re probably most familiar with
the terms ‘Seek’ and ‘Scan’ from
the graphical plans produced by SQL Server Management Studio (SSMS).
The image to
the left shows
the most common ones, with
the three types of scan at
the top, followed by four types of seek. You might look to
the SSMS tool-tip descriptions to explain
the differences between them: Not hugely helpful are they? Both mention scans and ranges (nothing about seeks) and
the Index Seek description implies that it will not scan
the index entirely (which isn’t necessarily true). Recall also yesterday’s post where we saw two Clustered Index Seek operations doing very different things.
The first Seek performed 63 single-row seeking operations; and
the second performed a ‘Range Scan’ (more on those later in this post). I hope you agree that those were two very different operations, and perhaps you are wondering why there aren’t different graphical plan icons for Range Scans and Seeks? I have often wondered about that, and
the first person to mention it after yesterday’s post was Erin Stellato (twitter | blog): Before we go on to make sense of all this, let’s look at another example of how SQL Server confusingly mixes
the terms ‘Scan’ and ‘Seek’ in different contexts.
The diagram below shows a very simple heap table with two columns, one of which is
the non-clustered Primary Key, and
the other has a non-unique non-clustered index defined on it.
The right hand side of
the diagram shows a simple query, it’s associated query plan, and a couple of extracts from
the SSMS tool-tip and Properties windows. Notice
the ‘scan direction’ entry in
the Properties window snippet. Is this a seek or a scan?
The different references to Scans and Seeks are even more pronounced in
the XML plan output that
the graphical plan is based on. This fragment is what lies behind
the single Index Seek icon shown above: You’ll find
the same confusing references to Seeks and Scans throughout
the product and its documentation. Making Sense of Seeks Let’s forget all about scans for a moment, and think purely about seeks. Loosely speaking, a seek is
the process of navigating an index B-tree to find a particular index record, most often at
the leaf level. A seek starts at
the root and navigates down through
the levels of
the index to find
the point of interest: Singleton Lookups
The simplest sort of seek predicate performs this traversal to find (at most) a single record. This is
the case when we search for a single value using a unique index and an equality predicate. It should be readily apparent that this type of search will either find one record, or none at all. This operation is known as a singleton lookup. Given
the example table from before,
the following query is an example of a singleton lookup seek: Sadly, there’s nothing in
the graphical plan or XML output to show that this is a singleton lookup – you have to infer it from
the fact that this is a single-value equality seek on a unique index.
The other common examples of a singleton lookup are bookmark lookups – both
the RID and Key Lookup forms are singleton lookups (an RID lookup finds a single record in a heap from
the unique row locator, and a Key Lookup does much
the same thing on a clustered table). If you happen to run your query with STATISTICS IO ON, you will notice that ‘Scan Count’ is always zero for a singleton lookup. Range Scans
The other type of seek predicate is a ‘seek plus range scan’, which I will refer to simply as a range scan.
The seek operation makes an initial descent into
the index structure to find
the first leaf row that qualifies, and then performs a range scan (either backwards or forwards in
the index) until it reaches
the end of
the scan range.
The ability of a range scan to proceed in either direction comes about because index pages at
the same level are connected by a doubly-linked list – each page has a pointer to
the previous page (in logical key order) as well as a pointer to
the following page.
The doubly-linked list is represented by
the green and red dotted arrows in
the index diagram presented earlier. One subtle (but important) point is that
the notion of a ‘forward’ or ‘backward’ scan applies to
the logical key order defined when
the index was built. In
the present case,
the non-clustered primary key index was created as follows: CREATE TABLE
dbo.Example
(
key_col INTEGER NOT NULL,
data INTEGER NOT NULL,
CONSTRAINT [PK dbo.Example key_col]
PRIMARY KEY NONCLUSTERED (key_col ASC)
)
;
Notice that
the primary key index specifies an ascending sort order for
the single key column. This means that a forward scan of
the index will retrieve keys in ascending order, while a backward scan would retrieve keys in descending key order. If
the index had been created instead on key_col DESC, a forward scan would retrieve keys in descending order, and a backward scan would return keys in ascending order.
A range scan seek predicate may have a Start condition, an End condition, or both. Where one is missing,
the scan starts (or ends) at one extreme end of
the index, depending on
the scan direction. Some examples might help clarify that:
the following diagram shows four queries, each of which performs a single seek against a column holding every integer from 1 to 100 inclusive.
The results from each query are shown in
the blue columns, and relevant attributes from
the Properties window appear on
the right:
Query 1 specifies that all key_col values less than 5 should be returned in ascending order.
The query plan achieves this by seeking to
the start of
the index leaf (there is no explicit starting value) and scanning forward until
the End condition (key_col < 5) is no longer satisfied (SQL Server knows it can stop looking as soon as it finds a key_col value that isn’t less than 5 because all later index entries are guaranteed to sort higher).
Query 2 asks for key_col values greater than 95, in descending order. SQL Server returns these results by seeking to
the end of
the index, and scanning backwards (in descending key order) until it comes across a row that isn’t greater than 95. Sharp-eyed readers may notice that
the end-of-scan condition is shown as a Start range value. This is a bug in
the XML show plan which bubbles up to
the Properties window – when a backward scan is performed,
the roles of
the Start and End values are reversed, but
the plan does not reflect that. Oh well.
Query 3 looks for key_col values that are greater than or equal to 10, and less than 15, in ascending order. This time, SQL Server seeks to
the first index record that matches
the Start condition (key_col >= 10) and then scans forward through
the leaf pages until
the End condition (key_col < 15) is no longer met.
Query 4 performs much
the same sort of operation as Query 3, but requests
the output in descending order. Again, we have to mentally reverse
the Start and End conditions because of
the bug, but otherwise
the process is
the same as always: SQL Server finds
the highest-sorting record that meets
the condition ‘key_col < 25’ and scans backward until ‘key_col >= 20’ is no longer true.
One final point to note: seek operations always have
the Ordered: True attribute. This means that
the operator always produces rows in a sorted order, either ascending or descending depending on how
the index was defined, and whether
the scan part of
the operation is forward or backward. You cannot rely on this sort order in your queries of course (you must always specify an ORDER BY clause if order is important) but SQL Server can make use of
the sort order internally. In
the four queries above,
the query optimizer was able to avoid an explicit Sort
operator to honour
the ORDER BY clause, for example.
Multiple Seek Predicates
As we saw yesterday, a single index seek plan
operator can contain one or more seek predicates. These seek predicates can either be all singleton seeks or all range scans – SQL Server does not mix them. For example, you might expect
the following query to contain two seek predicates, a singleton seek to find
the single record in
the unique index where key_col = 10, and a range scan to find
the key_col values between 15 and 20:
SELECT key_col
FROM dbo.Example
WHERE key_col = 10
OR key_col BETWEEN 15 AND 20
ORDER BY key_col ASC
;
In fact, SQL Server transforms
the singleton seek (key_col = 10) to
the equivalent range scan, Start:[key_col >= 10], End:[key_col <= 10]. This allows both range scans to be evaluated by a single seek
operator. To be clear, this query results in two range scans: one from 10 to 10, and one from 15 to 20.
Final Thoughts
That’s it for today – tomorrow we’ll look at monitoring singleton lookups and range scans, and I’ll show you a seek on a heap table.
Yes, a seek. On a heap. Not an index!
If you would like to run
the queries in this post for yourself, there’s a script below. Thanks for reading!
IF OBJECT_ID(N'dbo.Example', N'U')
IS NOT NULL
BEGIN
DROP TABLE dbo.Example;
END
;
-- Test table is a heap
-- Non-clustered primary key on 'key_col'
CREATE TABLE
dbo.Example
(
key_col INTEGER NOT NULL,
data INTEGER NOT NULL,
CONSTRAINT [PK dbo.Example key_col]
PRIMARY KEY NONCLUSTERED (key_col)
)
;
-- Non-unique non-clustered index on
the 'data' column
CREATE NONCLUSTERED INDEX
[IX dbo.Example data]
ON dbo.Example (data)
;
-- Add 100 rows
INSERT dbo.Example
WITH (TABLOCKX)
(
key_col,
data
)
SELECT key_col = V.number,
data = V.number
FROM master.dbo.spt_values AS V
WHERE V.[type] = N'P'
AND V.number BETWEEN 1 AND 100
;
-- ================
-- Singleton lookup
-- ================
;
-- Single value equality seek in a unique index
-- Scan count = 0 when STATISTIS IO is ON
-- Check
the XML SHOWPLAN
SELECT E.key_col
FROM dbo.Example AS E
WHERE E.key_col = 32
;
-- ===========
-- Range Scans
-- ===========
;
-- Query 1
SELECT E.key_col
FROM dbo.Example AS E
WHERE E.key_col <= 5
ORDER BY
E.key_col ASC
;
-- Query 2
SELECT E.key_col
FROM dbo.Example AS E
WHERE E.key_col > 95
ORDER BY
E.key_col DESC
;
-- Query 3
SELECT E.key_col
FROM dbo.Example AS E
WHERE E.key_col >= 10
AND E.key_col < 15
ORDER BY
E.key_col ASC
;
-- Query 4
SELECT E.key_col
FROM dbo.Example AS E
WHERE E.key_col >= 20
AND E.key_col < 25
ORDER BY
E.key_col DESC
;
-- Final query (singleton + range = 2 range scans)
SELECT E.key_col
FROM dbo.Example AS E
WHERE E.key_col = 10
OR E.key_col BETWEEN 15 AND 20
ORDER BY
E.key_col ASC
;
-- === TIDY UP ===
DROP TABLE dbo.Example;
© 2011 Paul White
email:
[email protected]
twitter: @SQL_Kiwi