A few days ago, Sandra Mueller (twitter | blog) asked a question using twitter’s #sqlhelp hash tag: “Might SQL Server retrieve (out-of-row) LOB data from a table, even if
the column isn’t referenced in
the query?” Leaving aside trivial cases (like selecting a computed column that does reference
the LOB data), one might be tempted to say that no, SQL Server does not read data you haven’t asked for. In general, that’s quite correct; however there are cases where SQL Server might sneakily retrieve a LOB column… Example Table Here’s a T-SQL script to create that table and populate it with 1,000 rows: CREATE TABLE dbo.LOBtest
(
pk INTEGER IDENTITY NOT NULL,
some_value INTEGER NULL,
lob_data VARCHAR(MAX) NULL,
another_column CHAR(5) NULL,
CONSTRAINT [PK dbo.LOBtest pk]
PRIMARY KEY CLUSTERED (pk ASC)
);
GO
DECLARE @Data VARCHAR(MAX);
SET @Data = REPLICATE(CONVERT(VARCHAR(MAX), 'x'), 65540);
WITH Numbers (n)
AS (
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0))
FROM master.sys.columns C1,
master.sys.columns C2
)
INSERT LOBtest WITH (TABLOCKX)
(
some_value,
lob_data
)
SELECT TOP (1000)
N.n,
@Data
FROM Numbers N
WHERE N.n <= 1000;
Test 1: A Simple Update
Let’s run a query to subtract one from every value in
the some_value column:
UPDATE dbo.LOBtest
WITH (TABLOCKX)
SET some_value = some_value - 1;
As you might expect, modifying this integer column in 1,000 rows doesn’t take very long, or use many resources.
The STATITICS IO and TIME output shows a total of 9 logical reads, and 25ms elapsed time.
The query plan is also very simple:
Looking at
the Clustered Index Scan, we can see that SQL Server only retrieves
the pk and some_value columns during
the scan:
The pk column is needed by
the Clustered Index Update
operator to uniquely identify
the row that is being changed.
The some_value column is used by
the Compute Scalar to calculate
the new value. (In case you are wondering what
the Top
operator is for, it is used to enforce SET ROWCOUNT).
Test 2: Simple Update with an Index
Now let’s create a nonclustered index keyed on
the some_value column, with lob_data as an included column:
CREATE NONCLUSTERED INDEX
[IX dbo.LOBtest some_value (lob_data)]
ON dbo.LOBtest (some_value)
INCLUDE (
lob_data
)
WITH (
FILLFACTOR = 100,
MAXDOP = 1,
SORT_IN_TEMPDB = ON
);
This is not a useful index for our simple update query; imagine that someone else created it for a different purpose. Let’s run our update query again:
UPDATE dbo.LOBtest
WITH (TABLOCKX)
SET some_value = some_value - 1;
We find that it now requires 4,014 logical reads and
the elapsed query time has increased to around 100ms.
The extra logical reads (4 per row) are an expected consequence of maintaining
the nonclustered index.
The query plan is very similar to before (click to enlarge):
The Clustered Index Update
operator picks up
the extra work of maintaining
the nonclustered index.
The new Compute Scalar operators detect whether
the value in
the some_value column has actually been changed by
the update. SQL Server may be able to skip maintaining
the nonclustered index if
the value hasn’t changed (see my previous post on non-updating updates for details). Our simple query does change
the value of some_data in every row, so this optimization doesn’t add any value in this specific case.
The output list of columns from
the Clustered Index Scan hasn’t changed from
the one shown previously: SQL Server still just reads
the pk and some_data columns. Cool.
Overall then, adding
the nonclustered index hasn’t had any startling effects, and
the LOB column data still isn’t being read from
the table. Let’s see what happens if we make
the nonclustered index unique.
Test 3: Simple Update with a Unique Index
Here’s
the script to create a new unique index, and drop
the old one:
CREATE UNIQUE NONCLUSTERED INDEX
[UQ dbo.LOBtest some_value (lob_data)]
ON dbo.LOBtest (some_value)
INCLUDE (
lob_data
)
WITH (
FILLFACTOR = 100,
MAXDOP = 1,
SORT_IN_TEMPDB = ON
);
GO
DROP INDEX [IX dbo.LOBtest some_value (lob_data)]
ON dbo.LOBtest;
Remember that SQL Server only enforces uniqueness on index keys (the some_data column).
The lob_data column is simply stored at
the leaf-level of
the non-clustered index. With that in mind, we might expect this change to make very little difference. Let’s see:
UPDATE dbo.LOBtest
WITH (TABLOCKX)
SET some_value = some_value - 1;
Whoa! Now look at
the elapsed time and logical reads:
Scan count 1, logical reads 2016, physical reads 0, read-ahead reads 0,
lob logical reads 36015, lob physical reads 0, lob read-ahead reads 15992.
CPU time = 172 ms, elapsed time = 16172 ms.
Even with all
the data and index pages in memory,
the query took over 16 seconds to update just 1,000 rows, performing over 52,000 LOB logical reads (nearly 16,000 of those using read-ahead).
Why on earth is SQL Server reading LOB data in a query that only updates a single integer column?
The Query Plan
The query plan for test 3 looks a bit more complex than before:
In fact,
the bottom level is exactly
the same as we saw with
the non-unique index.
The top level has heaps of new stuff though, which I’ll come to in a moment.
You might be expecting to find that
the Clustered Index Scan is now reading
the lob_data column (for some reason). After all, we need to explain where all
the LOB logical reads are coming from. Sadly, when we look at
the properties of
the Clustered Index Scan, we see exactly
the same as before:
SQL Server is still only reading
the pk and some_value columns – so what’s doing
the LOB reads?
Updates that Sneakily Read Data
We have to go as far as
the Clustered Index Update
operator before we see LOB data in
the output list:
[Expr1020] is a bit flag added by an earlier Compute Scalar. It is set true if
the some_value column has not been changed (part of
the non-updating updates optimization I mentioned earlier).
The Clustered Index Update
operator adds two new columns:
the lob_data column, and some_value_OLD.
The some_value_OLD column, as
the name suggests, is
the pre-update value of
the some_value column. At this point,
the clustered index has already been updated with
the new value, but we haven’t touched
the nonclustered index yet.
An interesting observation here is that
the Clustered Index Update
operator can read a column into
the data flow as part of its update operation. SQL Server could have read
the LOB data as part of
the initial Clustered Index Scan, but that would mean carrying
the data through all
the operations that occur prior to
the Clustered Index Update.
The server knows it will have to go back to
the clustered index row to update it, so it delays reading
the LOB data until then. Sneaky!
Why
the LOB Data Is Needed
This is all very interesting (I hope), but why is SQL Server reading
the LOB data? For that matter, why does it need to pass
the pre-update value of
the some_value column out of
the Clustered Index Update?
The answer relates to
the top row of
the query plan for test 3. I’ll reproduce it here for convenience:
Notice that this is a wide (per-index) update plan. SQL Server used a narrow (per-row) update plan in test 2, where
the Clustered Index Update took care of maintaining
the nonclustered index too. I’ll talk more about this difference shortly.
The Split/Sort/Collapse combination is an optimization, which aims to make per-index update plans more efficient. It does this by breaking each update into a delete/insert pair, reordering
the operations, removing any redundant operations, and finally applying
the net effect of all
the changes to
the nonclustered index.
Imagine we had a unique index which currently holds three rows with
the values 1, 2, and 3. If we run a query that adds 1 to each row value, we would end up with values 2, 3, and 4.
The net effect of all
the changes is
the same as if we simply deleted
the value 1, and added a new value 4.
By applying net changes, SQL Server can also avoid false unique-key violations. If we tried to immediately update
the value 1 to a 2, it would conflict with
the existing value 2 (which would soon be updated to 3 of course) and
the query would fail. You might argue that SQL Server could avoid
the uniqueness violation by starting with
the highest value (3) and working down. That’s fine, but it’s not possible to generalize this logic to work with every possible update query.
SQL Server has to use a wide update plan if it sees any risk of false uniqueness violations. It’s worth noting that
the logic SQL Server uses to detect whether these violations are possible has definite limits. As a result, you will often receive a wide update plan, even when you can see that no violations are possible.
Another benefit of this optimization is that it includes a sort on
the index key as part of its work. Processing
the index changes in index key order promotes sequential I/O against
the nonclustered index.
A side-effect of all this is that
the net changes might include one or more inserts. In order to insert a new row in
the index, SQL Server obviously needs all
the columns –
the key column and
the included LOB column. This is
the reason SQL Server reads
the LOB data as part of
the Clustered Index Update.
In addition,
the some_value_OLD column is required by
the Split
operator (it turns updates into delete/insert pairs). In order to generate
the correct index key delete operation, it needs
the old key value.
The irony is that in this case
the Split/Sort/Collapse optimization is anything but. Reading all that LOB data is extremely expensive, so it is sad that
the current version of SQL Server has no way to avoid it.
Finally, for completeness, I should mention that
the Filter
operator is there to filter out
the non-updating updates.
Beating
the Set-Based Update with a Cursor
One situation where SQL Server can see that false unique-key violations aren’t possible is where it can guarantee that only one row is being updated. Armed with this knowledge, we can write a cursor (or
the WHILE-loop equivalent) that updates one row at a time, and so avoids reading
the LOB data:
SET NOCOUNT ON;
SET STATISTICS XML, IO, TIME OFF;
DECLARE @PK INTEGER,
@StartTime DATETIME;
SET @StartTime = GETUTCDATE();
DECLARE curUpdate CURSOR
LOCAL
FORWARD_ONLY
KEYSET
SCROLL_LOCKS
FOR
SELECT L.pk
FROM LOBtest L
ORDER BY
L.pk ASC;
OPEN curUpdate;
WHILE (1 = 1)
BEGIN
FETCH NEXT
FROM curUpdate
INTO @PK;
IF @@FETCH_STATUS = -1 BREAK;
IF @@FETCH_STATUS = -2 CONTINUE;
UPDATE dbo.LOBtest
SET some_value = some_value - 1
WHERE CURRENT OF curUpdate;
END;
CLOSE curUpdate; DEALLOCATE curUpdate;
SELECT DATEDIFF(MILLISECOND, @StartTime, GETUTCDATE());
That completes
the update in 1280 milliseconds (remember test 3 took over 16 seconds!)
I used
the WHERE CURRENT OF syntax there and a KEYSET cursor, just for
the fun of it. One could just as well use a WHERE clause that specified
the primary key value instead.
Clustered Indexes
A clustered index is
the ultimate index with included columns: all non-key columns are included columns in a clustered index. Let’s re-create
the test table and data with an updatable primary key, and without any non-clustered indexes:
IF OBJECT_ID(N'dbo.LOBtest', N'U')
IS NOT NULL
DROP TABLE dbo.LOBtest;
GO
CREATE TABLE dbo.LOBtest
(
pk INTEGER NOT NULL,
some_value INTEGER NULL,
lob_data VARCHAR(MAX) NULL,
another_column CHAR(5) NULL,
CONSTRAINT [PK dbo.LOBtest pk]
PRIMARY KEY CLUSTERED (pk ASC)
);
GO
DECLARE @Data VARCHAR(MAX);
SET @Data = REPLICATE(CONVERT(VARCHAR(MAX), 'x'), 65540);
WITH Numbers (n)
AS (
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0))
FROM master.sys.columns C1,
master.sys.columns C2
)
INSERT LOBtest WITH (TABLOCKX)
(
pk,
some_value,
lob_data
)
SELECT TOP (1000)
N.n,
N.n,
@Data
FROM Numbers N
WHERE N.n <= 1000;
Now here’s a query to modify
the cluster keys:
UPDATE dbo.LOBtest
SET pk = pk + 1;
The query plan is:
As you can see,
the Split/Sort/Collapse optimization is present, and we also gain an Eager Table Spool, for Halloween protection. In addition, SQL Server now has no choice but to read
the LOB data in
the Clustered Index Scan:
The performance is not great, as you might expect (even though there is no non-clustered index to maintain):
Table 'LOBtest'.
Scan count 1, logical reads 2011, physical reads 0, read-ahead reads 0,
lob logical reads 36015, lob physical reads 0, lob read-ahead reads 15992.
Table 'Worktable'.
Scan count 1, logical reads 2040, physical reads 0, read-ahead reads 0,
lob logical reads 34000, lob physical reads 0, lob read-ahead reads 8000.
SQL Server Execution Times:
CPU time = 483 ms, elapsed time = 17884 ms.
Notice how
the LOB data is read twice: once from
the Clustered Index Scan, and again from
the work table in tempdb used by
the Eager Spool.
If you try
the same test with a non-unique clustered index (rather than a primary key), you’ll get a much more efficient plan that just passes
the cluster key (including uniqueifier) around (no LOB data or other non-key columns):
A unique non-clustered index (on a heap) works well too:
Both those queries complete in a few tens of milliseconds, with no LOB reads, and just a few thousand logical reads. (In fact
the heap is rather more efficient).
There are lots more fun combinations to try that I don’t have space for here.
Final Thoughts
The behaviour shown in this post is not limited to LOB data by any means. If
the conditions are met, any unique index that has included columns can produce similar behaviour – something to bear in mind when adding large INCLUDE columns to achieve covering queries, perhaps.
Paul White
Email:
[email protected]
Twitter: @PaulWhiteNZ