B-trees, databases, sequential inputs, and speed.

Posted by IanC on Stack Overflow See other posts from Stack Overflow or by IanC
Published on 2011-01-04T20:25:30Z Indexed on 2011/01/04 20:54 UTC
Read the original article Hit count: 238

Filed under:

clustered-index

I know from experience that b-trees have awful performance when data is added to them sequentially (regardless of the direction). However, when data is added randomly, best performance is obtained.

This is easy to demonstrate with the likes of an RB-Tree. Sequential writes cause a maximum number of tree balances to be performed.

I know very few databases use binary trees, but rather used n-order balanced trees. I logically assume they suffer a similar fate to binary trees when it comes to sequential inputs.

This sparked my curiosity.

If this is so, then one could deduce that writing sequential IDs (such as in IDENTITY(1,1)) would cause multiple re-balances of the tree to occur. I have seen many posts argue against GUIDs as "these will cause random writes". I never use GUIDs, but it struck me that this "bad" point was in fact a good point.

So I decided to test it. Here is my code:

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[T1](
    [ID] [int] NOT NULL
 CONSTRAINT [T1_1] PRIMARY KEY CLUSTERED ([ID] ASC) 
)
GO

CREATE TABLE [dbo].[T2](
    [ID] [uniqueidentifier] NOT NULL
 CONSTRAINT [T2_1] PRIMARY KEY CLUSTERED ([ID] ASC)
)

GO

declare @i int, @t1 datetime, @t2 datetime, @t3 datetime, @c char(300)

set @t1 = GETDATE()
set @i = 1

while @i < 2000 begin
    insert into T2 values (NEWID(), @c)
    set @i = @i + 1
end

set @t2 = GETDATE()
WAITFOR delay '0:0:10'
set @t3 = GETDATE()
set @i = 1

while @i < 2000 begin
    insert into T1 values (@i, @c)
    set @i = @i + 1
end

select DATEDIFF(ms, @t1, @t2) AS [Int], DATEDIFF(ms, @t3, getdate()) AS [GUID]

drop table T1
drop table T2

Note that I am not subtracting any time for the creation of the GUID nor for the considerably extra size of the row. The results on my machine were as follows:

Int: 17,340 ms GUID: 6,746 ms

This means that in this test, random inserts of 16 bytes was almost 3 times faster than sequential inserts of 4 bytes.

Would anyone like to comment on this?

Ps. I get that this isn't a question. It's an invite to discussion, and that is relevant to learning optimum programming.

Developer IT

B-trees, databases, sequential inputs, and speed. - Developer IT

B-trees, databases, sequential inputs, and speed.

sql

sql-server

discussion

benchmarking

clustered-index

Related posts about sql

SQL SERVER – Concat Strings in SQL Server using T-SQL – SQL in Sixty Seconds #035 – Video

SQL SERVER – Concat Function in SQL Server – SQL Concatenation

Error with SQL Server Setup 2012 on Windows 2012

How can I detect which version of SQL (eg SQL 2008 or SQL Azure)

Nested SQL Select statement fails on SQL Server 2000, ok on SQL Server 2005

Related posts about sql-server

SQL SERVER – Beginning SQL Server: One Step at a Time – SQL Server Magazine

How to install SQL Server 2005 Configuration Manager without installing SQL Server Management Studio

[MAJ] SQL Server 2005 Express Edition - La version gratuite de SQL Server 2005

How to create SQL Server Express DB from SQL Server DB

sql server 2000 error, error trying to connect to sql server 2005

Categories cloud