Is it possible to see LOB (large object) logical reads from STATISTICS IO output on a table with no LOB columns? I was asked this question today by someone who had spent a good fraction of their afternoon trying to work out why this was occurring – even going so far as to re-run DBCC CHECKDB to see if any corruption had taken place.
The table in question wasn’t particularly pretty – it had grown somewhat organically over time, with new columns being added every so often as
the need arose. Nevertheless, it remained a simple structure with no LOB columns – no TEXT or IMAGE, no XML, no MAX types – nothing aside from ordinary INT, MONEY, VARCHAR, and DATETIME types. To add to
the air of mystery, not every query that ran against
the table would report LOB logical reads – just sometimes – but when it did,
the query often took much longer to execute. Ok, enough of
the pre-amble. I can’t reproduce
the exact structure here, but
the following script creates a table that will serve to demonstrate
the effect: IF OBJECT_ID(N'dbo.Test', N'U')
IS NOT NULL
DROP TABLE dbo.Test
GO
CREATE TABLE
dbo.Test
(
row_id NUMERIC IDENTITY NOT NULL,
col01 NVARCHAR(450) NOT NULL,
col02 NVARCHAR(450) NOT NULL,
col03 NVARCHAR(450) NOT NULL,
col04 NVARCHAR(450) NOT NULL,
col05 NVARCHAR(450) NOT NULL,
col06 NVARCHAR(450) NOT NULL,
col07 NVARCHAR(450) NOT NULL,
col08 NVARCHAR(450) NOT NULL,
col09 NVARCHAR(450) NOT NULL,
col10 NVARCHAR(450) NOT NULL,
CONSTRAINT [PK dbo.Test row_id]
PRIMARY KEY CLUSTERED (row_id)
)
;
The next script loads
the ten variable-length character columns with one-character strings in
the first row, two-character strings in
the second row, and so on down to
the 450th row:
WITH Numbers
AS (
-- Generates numbers 1 - 450 inclusive
SELECT TOP (450)
n = ROW_NUMBER() OVER (ORDER BY (SELECT 0))
FROM master.sys.columns C1,
master.sys.columns C2,
master.sys.columns C3
ORDER BY
n ASC
)
INSERT dbo.Test
WITH (TABLOCKX)
SELECT REPLICATE(N'A', N.n), REPLICATE(N'B', N.n),
REPLICATE(N'C', N.n), REPLICATE(N'D', N.n),
REPLICATE(N'E', N.n), REPLICATE(N'F', N.n),
REPLICATE(N'G', N.n), REPLICATE(N'H', N.n),
REPLICATE(N'I', N.n), REPLICATE(N'J', N.n)
FROM Numbers AS N
ORDER BY
N.n ASC
;
Once those two scripts have run,
the table contains 450 rows and 10 columns of data like this:
Most of
the time, when we query data from this table, we don’t see any LOB logical reads, for example:
-- Find
the maximum length of
the data in
-- column 5 for a range of rows
SELECT result = MAX(DATALENGTH(T.col05))
FROM dbo.Test AS T
WHERE row_id BETWEEN 50 AND 100
;
But with a different query…
-- Read all
the data in column 1
SELECT result = MAX(DATALENGTH(T.col01))
FROM dbo.Test AS T
;
…suddenly we have 49 LOB logical reads, as well as
the ‘normal’ logical reads we would expect.
The Explanation
If we had tried to create this table in SQL Server 2000, we would have received a warning message to say that future INSERT or UPDATE operations on
the table might fail if
the resulting row exceeded
the in-row storage limit of 8060 bytes. If we needed to store more data than would fit in an 8060 byte row (including internal overhead) we had to use a LOB column – TEXT, NTEXT, or IMAGE. These special data types store
the large data values in a separate structure, with just a small pointer left in
the original row.
Row Overflow
SQL Server 2005 introduced a feature called row overflow, which allows one or more variable-length columns in a row to move to off-row storage if
the data in a particular row would otherwise exceed 8060 bytes. You no longer receive a warning when creating (or altering) a table that might need more than 8060 bytes of in-row storage; if SQL Server finds that it can no longer fit a variable-length column in a particular row, it will silently move one or more of these columns off
the row into a separate allocation unit.
Only variable-length columns can be moved in this way (for example
the (N)VARCHAR, VARBINARY, and SQL_VARIANT types). Fixed-length columns (like INTEGER and DATETIME for example) never move into ‘row overflow’ storage.
The decision to move a column off-row is done on a row-by-row basis – so data in a particular column might be stored in-row for some table records, and off-row for others.
In general, if SQL Server finds that it needs to move a column into row-overflow storage, it moves
the largest variable-length column record for that row. Note that in
the case of an UPDATE statement that results in
the 8060 byte limit being exceeded, it might not be
the column that grew that is moved!
Sneaky LOBs
Anyway, that’s all very interesting but I don’t want to get too carried away with
the intricacies of row-overflow storage internals.
The point is that it is now possible to define a table with non-LOB columns that will silently exceed
the old row-size limit and result in ordinary variable-length columns being moved to off-row storage. Adding new columns to a table, expanding an existing column definition, or simply storing more data in a column than you used to – all these things can result in one or more variable-length columns being moved off
the row.
Note that row-overflow storage is logically quite different from old-style LOB and new-style MAX data type storage – individual variable-length columns are still limited to 8000 bytes each – you can just have more of them now. Having said that,
the physical mechanisms involved are very similar to full LOB storage – a column moved to row-overflow leaves a 24-byte pointer record in
the row, and
the ‘separate storage’ I have been talking about is structured very similarly to both old-style LOBs and new-style MAX types.
The disadvantages are also
the same: when SQL Server needs a row-overflow column value it needs to follow
the in-row pointer a navigate another chain of pages, just like retrieving a traditional LOB.
And Finally…
In
the example script presented above,
the rows with row_id values from 402 to 450 inclusive all exceed
the total in-row storage limit of 8060 bytes. A SELECT that references a column in one of those rows that has moved to off-row storage will incur one or more lob logical reads as
the storage engine locates
the data.
The results on your system might vary slightly depending on your settings, of course; but in my tests only column 1 in rows 402-450 moved off-row. You might like to play around with
the script – updating columns, changing data type lengths, and so on – to see
the effect on lob logical reads and which columns get moved when. You might even see row-overflow columns moving back in-row if they are updated to be smaller (hint: reduce
the size of a column entry by at least 1000 bytes if you hope to see this).
Be aware that SQL Server will not warn you when it moves ‘ordinary’ variable-length columns into overflow storage, and it can have dramatic effects on performance. It makes more sense than ever to choose column data types sensibly. If you make every column a VARCHAR(8000) or NVARCHAR(4000), and someone stores data that results in a row needing more than 8060 bytes, SQL Server might turn some of your column data into pseudo-LOBs – all without saying a word.
Finally, some people make a distinction between ordinary LOBs (those that can hold up to 2GB of data) and
the LOB-like structures created by row-overflow (where columns are still limited to 8000 bytes) by referring to row-overflow LOBs as SLOBs. I find that quite appealing, but
the ‘S’ stands for ‘small’, which makes expanding
the whole acronym a little daft-sounding…small large objects anyone?
© Paul White 2011
email:
[email protected]
twitter: @SQL_Kiwi