Is it possible to see LOB (large object) logical reads from STATISTICS IO output on
a table with no LOB columns? I was asked this question today by someone who had spent
a good fraction of their afternoon trying to work out why this was occurring – even going so far as to re-run DBCC CHECKDB to see if any corruption had taken place. The table in question wasn’t particularly pretty – it had grown somewhat organically over time, with new columns being added
every so often as the need arose. Nevertheless, it remained
a simple structure with no LOB columns – no TEXT or IMAGE, no XML, no MAX types – nothing aside from ordinary INT, MONEY, VARCHAR, and DATETIME types. To add to the air of mystery, not
every query that ran against the table would report LOB logical reads – just sometimes – but when it did, the query often took much longer to execute. Ok, enough of the pre-amble. I can’t reproduce the exact structure here, but the following script creates
a table that will serve to demonstrate the effect: IF OBJECT_ID(N'dbo.Test', N'U')
IS NOT NULL
DROP TABLE dbo.Test
GO
CREATE TABLE
dbo.Test
(
row_id NUMERIC IDENTITY NOT NULL,
col01 NVARCHAR(450) NOT NULL,
col02 NVARCHAR(450) NOT NULL,
col03 NVARCHAR(450) NOT NULL,
col04 NVARCHAR(450) NOT NULL,
col05 NVARCHAR(450) NOT NULL,
col06 NVARCHAR(450) NOT NULL,
col07 NVARCHAR(450) NOT NULL,
col08 NVARCHAR(450) NOT NULL,
col09 NVARCHAR(450) NOT NULL,
col10 NVARCHAR(450) NOT NULL,
CONSTRAINT [PK dbo.Test row_id]
PRIMARY KEY CLUSTERED (row_id)
)
;
The next script loads the ten variable-length character columns with one-character strings in the first row, two-character strings in the second row, and so on down to the 450th row:
WITH Numbers
AS (
-- Generates numbers 1 - 450 inclusive
SELECT TOP (450)
n = ROW_NUMBER() OVER (ORDER BY (SELECT 0))
FROM master.sys.columns C1,
master.sys.columns C2,
master.sys.columns C3
ORDER BY
n ASC
)
INSERT dbo.Test
WITH (TABLOCKX)
SELECT REPLICATE(N'
A', N.n), REPLICATE(N'B', N.n),
REPLICATE(N'C', N.n), REPLICATE(N'D', N.n),
REPLICATE(N'E', N.n), REPLICATE(N'F', N.n),
REPLICATE(N'G', N.n), REPLICATE(N'H', N.n),
REPLICATE(N'I', N.n), REPLICATE(N'J', N.n)
FROM Numbers AS N
ORDER BY
N.n ASC
;
Once those two scripts have run, the table contains 450 rows and 10 columns of data like this:
Most of the time, when we query data from this table, we don’t see any LOB logical reads, for example:
-- Find the maximum length of the data in
-- column 5 for
a range of rows
SELECT result = MAX(DATALENGTH(T.col05))
FROM dbo.Test AS T
WHERE row_id BETWEEN 50 AND 100
;
But with
a different query…
-- Read all the data in column 1
SELECT result = MAX(DATALENGTH(T.col01))
FROM dbo.Test AS T
;
…suddenly we have 49 LOB logical reads, as well as the ‘normal’ logical reads we would expect.
The Explanation
If we had tried to create this table in SQL Server 2000, we would have received
a warning message to say that future INSERT or UPDATE operations on the table might fail if the resulting row exceeded the in-row storage limit of 8060 bytes. If we needed to store more data than would fit in an 8060 byte row (including internal overhead) we had to use
a LOB column – TEXT, NTEXT, or IMAGE. These special data types store the large data values in
a separate structure, with just
a small pointer left in the original row.
Row Overflow
SQL Server 2005 introduced
a feature called row overflow, which allows one or more variable-length columns in
a row to move to off-row storage if the data in
a particular row would otherwise exceed 8060 bytes. You no longer receive
a warning when creating (or altering)
a table that might need more than 8060 bytes of in-row storage; if SQL Server finds that it can no longer fit
a variable-length column in
a particular row, it will silently move one or more of these columns off the row into
a separate allocation unit.
Only variable-length columns can be moved in this way (for example the (N)VARCHAR, VARBINARY, and SQL_VARIANT types). Fixed-length columns (like INTEGER and DATETIME for example) never move into ‘row overflow’ storage. The decision to move
a column off-row is done on
a row-by-row basis – so data in
a particular column might be stored in-row for some table records, and off-row for others.
In general, if SQL Server finds that it needs to move
a column into row-overflow storage, it moves the largest variable-length column record for that row. Note that in the case of an UPDATE statement that results in the 8060 byte limit being exceeded, it might not be the column that grew that is moved!
Sneaky LOBs
Anyway, that’s all very interesting but I don’t want to get too carried away with the intricacies of row-overflow storage internals. The
point is that it is now possible to define
a table with non-LOB columns that will silently exceed the old row-size limit and result in ordinary variable-length columns being moved to off-row storage. Adding new columns to
a table, expanding an existing column definition, or simply storing more data in
a column than you used to – all these things can result in one or more variable-length columns being moved off the row.
Note that row-overflow storage is logically quite different from old-style LOB and new-style MAX data type storage – individual variable-length columns are still limited to 8000 bytes each – you can just have more of them now. Having said that, the physical mechanisms involved are very similar to full LOB storage –
a column moved to row-overflow leaves
a 24-byte pointer record in the row, and the ‘separate storage’ I have been talking about is structured very similarly to both old-style LOBs and new-style MAX types. The disadvantages are also the same: when SQL Server needs
a row-overflow column value it needs to follow the in-row pointer
a navigate another chain of pages, just like retrieving
a traditional LOB.
And Finally…
In the example script presented above, the rows with row_id values from 402 to 450 inclusive all exceed the total in-row storage limit of 8060 bytes.
A SELECT that references
a column in one of those rows that has moved to off-row storage will incur one or more lob logical reads as the storage engine locates the data. The results on your system might vary slightly depending on your settings, of course; but in my tests only column 1 in rows 402-450 moved off-row. You might like to play around with the script – updating columns, changing data type lengths, and so on – to see the effect on lob logical reads and which columns get moved when. You might even see row-overflow columns moving back in-row if they are updated to be smaller (hint: reduce the size of
a column entry by at least 1000 bytes if you hope to see this).
Be aware that SQL Server will not warn you when it moves ‘ordinary’ variable-length columns into overflow storage, and it can have dramatic effects on performance. It makes more sense than ever to choose column data types sensibly. If you make
every column
a VARCHAR(8000) or NVARCHAR(4000), and someone stores data that results in
a row needing more than 8060 bytes, SQL Server might turn some of your column data into pseudo-LOBs – all without saying
a word.
Finally, some people make
a distinction between ordinary LOBs (those that can hold up to 2GB of data) and the LOB-like structures created by row-overflow (where columns are still limited to 8000 bytes) by referring to row-overflow LOBs as SLOBs. I find that quite appealing, but the ‘S’ stands for ‘small’, which makes expanding the whole acronym
a little daft-sounding…small large objects anyone?
© Paul White 2011
email:
[email protected]
twitter: @SQL_Kiwi