A few days ago, Sandra Mueller (twitter | blog) asked a question using twitter’s #sqlhelp hash tag: “Might SQL Server retrieve (out-of-row) LOB data from a table, even if the column isn’t referenced in the query?” Leaving aside trivial cases (like selecting a computed column that does reference the LOB data), one might be tempted to say that no, SQL Server does not read data you haven’t asked for. In general, that’s quite correct; however there are cases where SQL Server might sneakily retrieve a LOB column… Example Table Here’s a T-SQL script to create that table and populate it with 1,000 rows: CREATE TABLE dbo.LOBtest
(
pk INTEGER IDENTITY NOT NULL,
some_value INTEGER NULL,
lob_data VARCHAR(MAX) NULL,
another_column CHAR(5) NULL,
CONSTRAINT [PK dbo.LOBtest pk]
PRIMARY KEY CLUSTERED (pk ASC)
);
GO
DECLARE @Data VARCHAR(MAX);
SET @Data = REPLICATE(CONVERT(VARCHAR(MAX), 'x'), 65540);
WITH Numbers (n)
AS (
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0))
FROM master.sys.columns C1,
master.sys.columns C2
)
INSERT LOBtest WITH (TABLOCKX)
(
some_value,
lob_data
)
SELECT TOP (1000)
N.n,
@Data
FROM Numbers N
WHERE N.n <= 1000;
Test 1: A Simple Update
Let’s run a query to subtract one from every value in the some_value column:
UPDATE dbo.LOBtest
WITH (TABLOCKX)
SET some_value = some_value - 1;
As you might expect, modifying this integer column in 1,000 rows doesn’t take very long, or use many resources. The STATITICS IO and TIME output shows a total of 9 logical reads, and 25ms elapsed time. The query plan is also very simple:
Looking at the Clustered
Index Scan, we can see that SQL Server only retrieves the pk and some_value columns during the scan:
The pk column is needed by the Clustered
Index Update operator to uniquely identify the row that is being changed. The some_value column is used by the Compute Scalar to calculate the new value. (In case you are wondering what the Top operator is for, it is used to enforce SET ROWCOUNT).
Test 2: Simple Update with an
Index
Now let’s create a nonclustered
index keyed on the some_value column, with lob_data as an included column:
CREATE NONCLUSTERED
INDEX
[IX dbo.LOBtest some_value (lob_data)]
ON dbo.LOBtest (some_value)
INCLUDE (
lob_data
)
WITH (
FILLFACTOR = 100,
MAXDOP = 1,
SORT_IN_TEMPDB = ON
);
This is not a useful
index for our simple update query; imagine that someone else created it for a different purpose. Let’s run our update query again:
UPDATE dbo.LOBtest
WITH (TABLOCKX)
SET some_value = some_value - 1;
We find that it now requires 4,014 logical reads and the elapsed query time has increased to around 100ms. The extra logical reads (4 per row) are an expected consequence of maintaining the nonclustered
index.
The query plan is very similar to before (click to enlarge):
The Clustered
Index Update operator picks up the extra work of maintaining the nonclustered
index.
The new Compute Scalar operators detect whether the value in the some_value column has actually been changed by the update. SQL Server may be able to skip maintaining the nonclustered
index if the value hasn’t changed (see my previous post on non-updating updates for details). Our simple query does change the value of some_data in every row, so this optimization doesn’t add any value in this specific case.
The output list of columns from the Clustered
Index Scan hasn’t changed from the one shown previously: SQL Server still just reads the pk and some_data columns. Cool.
Overall then, adding the nonclustered
index hasn’t had any startling effects, and the LOB column data still isn’t being read from the table. Let’s see what happens if we make the nonclustered
index unique.
Test 3: Simple Update with a Unique
Index
Here’s the script to create a new unique
index, and drop the old one:
CREATE UNIQUE NONCLUSTERED
INDEX
[UQ dbo.LOBtest some_value (lob_data)]
ON dbo.LOBtest (some_value)
INCLUDE (
lob_data
)
WITH (
FILLFACTOR = 100,
MAXDOP = 1,
SORT_IN_TEMPDB = ON
);
GO
DROP
INDEX [IX dbo.LOBtest some_value (lob_data)]
ON dbo.LOBtest;
Remember that SQL Server only enforces uniqueness on
index keys (the some_data column). The lob_data column is simply stored at the leaf-level of the non-clustered
index. With that in mind, we might expect this change to make very little difference. Let’s see:
UPDATE dbo.LOBtest
WITH (TABLOCKX)
SET some_value = some_value - 1;
Whoa! Now look at the elapsed time and logical reads:
Scan count 1, logical reads 2016, physical reads 0, read-ahead reads 0,
lob logical reads 36015, lob physical reads 0, lob read-ahead reads 15992.
CPU time = 172 ms, elapsed time = 16172 ms.
Even with all the data and
index pages in memory, the query took over 16 seconds to update just 1,000 rows, performing over 52,000 LOB logical reads (nearly 16,000 of those using read-ahead).
Why on earth is SQL Server reading LOB data in a query that only updates a single integer column?
The Query Plan
The query plan for test 3 looks a bit more complex than before:
In fact, the bottom level is exactly the same as we saw with the non-unique
index. The top level has heaps of new stuff though, which I’ll come to in a moment.
You might be expecting to find that the Clustered
Index Scan is now reading the lob_data column (for some reason). After all, we need to explain where all the LOB logical reads are coming from. Sadly, when we look at the properties of the Clustered
Index Scan, we see exactly the same as before:
SQL Server is still only reading the pk and some_value columns – so what’s doing the LOB reads?
Updates that Sneakily Read Data
We have to go as far as the Clustered
Index Update operator before we see LOB data in the output list:
[Expr1020] is a bit flag added by an earlier Compute Scalar. It is set true if the some_value column has not been changed (part of the non-updating updates optimization I mentioned earlier).
The Clustered
Index Update operator adds two new columns: the lob_data column, and some_value_OLD. The some_value_OLD column, as the name suggests, is the pre-update value of the some_value column. At this point, the clustered
index has already been updated with the new value, but we haven’t touched the nonclustered
index yet.
An interesting observation here is that the Clustered
Index Update operator can read a column into the data flow as part of its update operation. SQL Server could have read the LOB data as part of the initial Clustered
Index Scan, but that would mean carrying the data through all the operations that occur prior to the Clustered
Index Update. The server knows it will have to go back to the clustered
index row to update it, so it delays reading the LOB data until then. Sneaky!
Why the LOB Data Is Needed
This is all very interesting (I hope), but why is SQL Server reading the LOB data? For that matter, why does it need to pass the pre-update value of the some_value column out of the Clustered
Index Update?
The answer relates to the top row of the query plan for test 3. I’ll reproduce it here for convenience:
Notice that this is a wide (per-index) update plan. SQL Server used a narrow (per-row) update plan in test 2, where the Clustered
Index Update took care of maintaining the nonclustered
index too. I’ll talk more about this difference shortly.
The Split/Sort/Collapse combination is an optimization, which aims to make per-index update plans more efficient. It does this by breaking each update into a delete/insert pair, reordering the operations, removing any redundant operations, and finally applying the net effect of all the changes to the nonclustered
index.
Imagine we had a unique
index which currently holds three rows with the values 1, 2, and 3. If we run a query that adds 1 to each row value, we would end up with values 2, 3, and 4. The net effect of all the changes is the same as if we simply deleted the value 1, and added a new value 4.
By applying net changes, SQL Server can also avoid false unique-key violations. If we tried to immediately update the value 1 to a 2, it would conflict with the existing value 2 (which would soon be updated to 3 of course) and the query would fail. You might argue that SQL Server could avoid the uniqueness violation by starting with the highest value (3) and working down. That’s fine, but it’s not possible to generalize this logic to work with every possible update query.
SQL Server has to use a wide update plan if it sees any risk of false uniqueness violations. It’s worth noting that the logic SQL Server uses to detect whether these violations are possible has definite limits. As a result, you will often receive a wide update plan, even when you can see that no violations are possible.
Another benefit of this optimization is that it includes a sort on the
index key as part of its work. Processing the
index changes in
index key order promotes sequential I/O against the nonclustered
index.
A side-effect of all this is that the net changes might include one or more inserts. In order to insert a new row in the
index, SQL Server obviously needs all the columns – the key column and the included LOB column. This is the reason SQL Server reads the LOB data as part of the Clustered
Index Update.
In addition, the some_value_OLD column is required by the Split operator (it turns updates into delete/insert pairs). In order to generate the correct
index key delete operation, it needs the old key value.
The irony is that in this case the Split/Sort/Collapse optimization is anything but. Reading all that LOB data is extremely expensive, so it is sad that the current version of SQL Server has no way to avoid it.
Finally, for completeness, I should mention that the Filter operator is there to filter out the non-updating updates.
Beating the Set-Based Update with a Cursor
One situation where SQL Server can see that false unique-key violations aren’t possible is where it can guarantee that only one row is being updated. Armed with this knowledge, we can write a cursor (or the WHILE-loop equivalent) that updates one row at a time, and so avoids reading the LOB data:
SET NOCOUNT ON;
SET STATISTICS XML, IO, TIME OFF;
DECLARE @PK INTEGER,
@StartTime DATETIME;
SET @StartTime = GETUTCDATE();
DECLARE curUpdate CURSOR
LOCAL
FORWARD_ONLY
KEYSET
SCROLL_LOCKS
FOR
SELECT L.pk
FROM LOBtest L
ORDER BY
L.pk ASC;
OPEN curUpdate;
WHILE (1 = 1)
BEGIN
FETCH NEXT
FROM curUpdate
INTO @PK;
IF @@FETCH_STATUS = -1 BREAK;
IF @@FETCH_STATUS = -2 CONTINUE;
UPDATE dbo.LOBtest
SET some_value = some_value - 1
WHERE CURRENT OF curUpdate;
END;
CLOSE curUpdate; DEALLOCATE curUpdate;
SELECT DATEDIFF(MILLISECOND, @StartTime, GETUTCDATE());
That completes the update in 1280 milliseconds (remember test 3 took over 16 seconds!)
I used the WHERE CURRENT OF syntax there and a KEYSET cursor, just for the fun of it. One could just as well use a WHERE clause that specified the primary key value instead.
Clustered Indexes
A clustered
index is the ultimate
index with included columns: all non-key columns are included columns in a clustered
index. Let’s re-create the test table and data with an updatable primary key, and without any non-clustered indexes:
IF OBJECT_ID(N'dbo.LOBtest', N'U')
IS NOT NULL
DROP TABLE dbo.LOBtest;
GO
CREATE TABLE dbo.LOBtest
(
pk INTEGER NOT NULL,
some_value INTEGER NULL,
lob_data VARCHAR(MAX) NULL,
another_column CHAR(5) NULL,
CONSTRAINT [PK dbo.LOBtest pk]
PRIMARY KEY CLUSTERED (pk ASC)
);
GO
DECLARE @Data VARCHAR(MAX);
SET @Data = REPLICATE(CONVERT(VARCHAR(MAX), 'x'), 65540);
WITH Numbers (n)
AS (
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0))
FROM master.sys.columns C1,
master.sys.columns C2
)
INSERT LOBtest WITH (TABLOCKX)
(
pk,
some_value,
lob_data
)
SELECT TOP (1000)
N.n,
N.n,
@Data
FROM Numbers N
WHERE N.n <= 1000;
Now here’s a query to modify the cluster keys:
UPDATE dbo.LOBtest
SET pk = pk + 1;
The query plan is:
As you can see, the Split/Sort/Collapse optimization is present, and we also gain an Eager Table Spool, for Halloween protection. In addition, SQL Server now has no choice but to read the LOB data in the Clustered
Index Scan:
The performance is not great, as you might expect (even though there is no non-clustered
index to maintain):
Table 'LOBtest'.
Scan count 1, logical reads 2011, physical reads 0, read-ahead reads 0,
lob logical reads 36015, lob physical reads 0, lob read-ahead reads 15992.
Table 'Worktable'.
Scan count 1, logical reads 2040, physical reads 0, read-ahead reads 0,
lob logical reads 34000, lob physical reads 0, lob read-ahead reads 8000.
SQL Server Execution Times:
CPU time = 483 ms, elapsed time = 17884 ms.
Notice how the LOB data is read twice: once from the Clustered
Index Scan, and again from the work table in tempdb used by the Eager Spool.
If you try the same test with a non-unique clustered
index (rather than a primary key), you’ll get a much more efficient plan that just passes the cluster key (including uniqueifier) around (no LOB data or other non-key columns):
A unique non-clustered
index (on a heap) works well too:
Both those queries complete in a few tens of milliseconds, with no LOB reads, and just a few thousand logical reads. (In fact the heap is rather more efficient).
There are lots more fun combinations to try that I don’t have space for here.
Final Thoughts
The behaviour shown in this post is not limited to LOB data by any means. If the conditions are met, any unique
index that has included columns can produce similar behaviour – something to bear in mind when adding large INCLUDE columns to achieve
covering queries, perhaps.
Paul White
Email:
[email protected]
Twitter: @PaulWhiteNZ