Adding a column to a table is common task for
DBAs. You can add a column to a table which is a nullable column or
which has default values. But are these two operations are similar internally and which method is optimal?
Let us start this with an example.
I created a database and a table using following script:
USE master
Go
--Drop Database if exists
IF EXISTS (SELECT 1 FROM SYS.databases
WHERE name = 'AddColumn') DROP DATABASE AddColumn --Create the database CREATE DATABASE AddColumn GO
USE AddColumn
GO
--Drop the table if exists
IF EXISTS (
SELECT 1 FROM sys.tables WHERE Name = 'ExistingTable')
DROP TABLE ExistingTable
GO
--Create the table
CREATE TABLE ExistingTable
(ID BIGINT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
DateTime1 DATETIME DEFAULT GETDATE(),
DateTime2 DATETIME DEFAULT GETDATE(),
DateTime3 DATETIME DEFAULT GETDATE(),
DateTime4 DATETIME DEFAULT GETDATE(),
Gendar CHAR(1) DEFAULT 'M',
STATUS1 CHAR(1) DEFAULT 'Y'
)
GO
-- Insert 100,000 records with defaults records
INSERT INTO ExistingTable
DEFAULT VALUES
GO 100000
Before adding a Column
Before adding a column let us look at some of the details of
the database.
DBCC
IND (AddColumn,ExistingTable,1)
By running the above query, you will see 637 pages
for the created table.
Adding a Column
You can add a column to the table with following
statement.
ALTER
TABLE ExistingTable Add NewColumn INT NULL
Above will add a column with a null value for
the existing records.
Alternatively you could add a column with default values.
ALTER TABLE ExistingTable Add
NewColumn INT NOT NULL DEFAULT 1
The above statement will add a column with a 1 value to the
existing records.
In the below table I measured the performance difference between
above two statements.
Parameter
Nullable Column
Default Value
CPU
31
702
Duration
129 ms
6653 ms
Reads
38
116,397
Writes
6
1329
Row Count
0
100000
If you look at the RowCount parameter, you can
clearly see the difference. Though column is added in the first case, none of
the rows are affected while in the second case all the rows are updated. That
is the reason, why it has taken more duration and CPU to add column with
Default value.
We can verify this by several methods.
Number of Pages
The number of data pages can be obtained by using DBCC
IND command. Though, this an undocumented dbcc command, many experts are ok to use this command in production. However, since there is no
official word from Microsoft, use this “at your own risk”.
DBCC IND (AddColumn,ExistingTable,1)
Before Adding the Columns
637
Adding a Column with NULL
637
Adding a column with DEFAULT value
1270
This clearly shows that pages are
physically modified. Please note, a high value indicated in the Adding a column with DEFAULT value
column is also a result of page splits.
Continues…