Often you have the need to archive data from a
table.
This leads to a number of challenges
1. How can you do it without impacting users
2. How can I make it transactionally consistent, i.e. the data I put in the
archive is the data I remove from the main table
3. How can I get it to perform well
Points 1 is very much tied to point 3. If it doesn't perform well then
the delete of data is going to cause lots of locks and thus potentially
blocking.
For points 1 and 3 refer to my previous posts DELETE-TOP-x-rows-avoiding-a-table-scan and
UPDATE-and-DELETE-TOP-and-ORDER-BY---Part2.
In essence you need to be removing small chunks of data from your table and you
want to do that avoiding a table scan.
So that deals with the delete approach but archiving is about inserting that
data somewhere else.
Well in SQL 2008 they introduced a new feature INSERT over DML (Data
Manipulation Language, i.e. SQL statements that change data), or composable DML.
The ability to nest DML statements within themselves, so you can past the
results of an insert to an update to a merge. I've mentioned this before here SQL-Server-2008---MERGE-and-optimistic-concurrency.
This feature is currently limited to being able to consume the results of a DML
statement in an INSERT statement. There are many restrictions which you can find
here http://msdn.microsoft.com/en-us/library/ms177564.aspx look
for the section "Inserting Data Returned From an OUTPUT Clause Into a Table"
Even with the restrictions what we can do is consume the OUTPUT from a DELETE
and INSERT the results into a table in another database. Note that in BOL it
refers to not being able to use a remote table, remote means a table on another
SQL instance.
To show this working use this SQL to setup two databases foo and
fooArchive
create database foo
go
--create
the source table fred in database foo
select * into foo..fred from sys.objects
go
create database fooArchive
go
if object_id('fredarchive',DB_ID('fooArchive')) is null
begin
select getdate()
ArchiveDate,* into fooArchive..FredArchive from
sys.objects where
1=2
end
go
And then we can use this simple statement to archive the data
insert into fooArchive..FredArchive
select getdate(),d.*
from
(delete top (1)
from foo..Fred
output deleted.*) d
go
In this statement the delete can be any delete statement you wish so if you
are deleting by ids or a range of values then you can do that. Refer to the DELETE-TOP-x-rows-avoiding-a-table-scan post
to ensure that your delete is going to perform. The last thing you want to do is
to perform 100 deletes each with 5000 records for each of those deletes to do a
table scan.
For a solution that works for SQL2005 or if you want to archive to a
different server then you can use linked servers or SSIS. This example
shows how to do it with linked servers. [ONARC-LAP03] is the source server.
begin transaction
insert into fooArchive..FredArchive
select getdate(),d.*
from openquery ([ONARC-LAP03],'delete top (1)
from
foo..Fred
output deleted.*')
d
commit
transaction
and to prove the
transactions work try, you should get the same number of records before and
after.
select
(select count(1) from foo..Fred)
fred
,(select COUNT(1) from
fooArchive..FredArchive ) fredarchive
begin transaction
insert into fooArchive..FredArchive
select getdate(),d.*
from openquery ([ONARC-LAP03],'delete top (1)
from foo..Fred
output deleted.*')
d
rollback transaction
select
(select count(1) from foo..Fred)
fred
,(select COUNT(1) from
fooArchive..FredArchive ) fredarchive
The transactions are very important with this solution. Look what happens
when you don't have transactions and an error occurs
select
(select count(1) from foo..Fred)
fred
,(select COUNT(1) from
fooArchive..FredArchive ) fredarchive
insert into fooArchive..FredArchive
select getdate(),d.*
from openquery ([ONARC-LAP03],'delete top (1)
from foo..Fred
output deleted.*
raiserror (''Oh doo doo'',15,15)')
d
select
(select count(1) from foo..Fred)
fred
,(select COUNT(1) from
fooArchive..FredArchive ) fredarchive
Before running this think what the result would be. I got it wrong.
What seems to happen is that the remote query is executed as a transaction,
the error causes that to rollback. However the results have already been sent to
the client and so get inserted into the