How to archive data from a table to a local or remote database in SQL 2005 and SQL 2008
Posted
by simonsabin
on SQL Blogcasts
See other posts from SQL Blogcasts
or by simonsabin
Published on Sat, 06 Mar 2010 09:12:44 GMT
Indexed on
2010/03/11
17:20 UTC
Read the original article
Hit count: 500
Often you have the need to archive data from a table.
This leads to a number of challenges
1. How can you do it without impacting users
2. How can I make it transactionally consistent, i.e. the data I put in the archive is the data I remove from the main table
3. How can I get it to perform well
Points 1 is very much tied to point 3. If it doesn't perform well then the delete of data is going to cause lots of locks and thus potentially blocking.
For points 1 and 3 refer to my previous posts DELETE-TOP-x-rows-avoiding-a-table-scan and UPDATE-and-DELETE-TOP-and-ORDER-BY---Part2. In essence you need to be removing small chunks of data from your table and you want to do that avoiding a table scan.
So that deals with the delete approach but archiving is about inserting that data somewhere else.
Well in SQL 2008 they introduced a new feature INSERT over DML (Data Manipulation Language, i.e. SQL statements that change data), or composable DML. The ability to nest DML statements within themselves, so you can past the results of an insert to an update to a merge. I've mentioned this before here SQL-Server-2008---MERGE-and-optimistic-concurrency. This feature is currently limited to being able to consume the results of a DML statement in an INSERT statement. There are many restrictions which you can find here http://msdn.microsoft.com/en-us/library/ms177564.aspx look for the section "Inserting Data Returned From an OUTPUT Clause Into a Table"
Even with the restrictions what we can do is consume the OUTPUT from a DELETE and INSERT the results into a table in another database. Note that in BOL it refers to not being able to use a remote table, remote means a table on another SQL instance.
To show this working use this SQL to setup two databases foo and fooArchive
create database foo
go
--create the source table fred in database foo
select * into foo..fred from sys.objects
go
create database fooArchive
go
if object_id('fredarchive',DB_ID('fooArchive')) is null
begin
select getdate() ArchiveDate,* into fooArchive..FredArchive from sys.objects where 1=2
end
go
And then we can use this simple statement to archive the data
insert into fooArchive..FredArchive
select getdate(),d.*
from (delete top (1)
from foo..Fred
output deleted.*) d
go
In this statement the delete can be any delete statement you wish so if you are deleting by ids or a range of values then you can do that. Refer to the DELETE-TOP-x-rows-avoiding-a-table-scan post to ensure that your delete is going to perform. The last thing you want to do is to perform 100 deletes each with 5000 records for each of those deletes to do a table scan.
For a solution that works for SQL2005 or if you want to archive to a different server then you can use linked servers or SSIS. This example shows how to do it with linked servers. [ONARC-LAP03] is the source server.
begin transaction
insert into fooArchive..FredArchive
select getdate(),d.*
from openquery ([ONARC-LAP03],'delete top (1)
from foo..Fred
output deleted.*') d
commit transaction
and to prove the transactions work try, you should get the same number of records before and after.
select (select count(1) from foo..Fred) fred
,(select COUNT(1) from fooArchive..FredArchive ) fredarchive
begin
transactioninsert into fooArchive..FredArchive
select getdate(),d.*
from openquery ([ONARC-LAP03],'delete top (1)
from foo..Fred
output deleted.*') d
rollback transaction
select (select count(1) from foo..Fred) fred
,(select COUNT(1) from fooArchive..FredArchive ) fredarchive
The transactions are very important with this solution. Look what happens when you don't have transactions and an error occurs
select (select count(1) from foo..Fred) fred
,(select COUNT(1) from fooArchive..FredArchive ) fredarchive
insert into fooArchive..FredArchive
select getdate(),d.*
from openquery ([ONARC-LAP03],'delete top (1)
from foo..Fred
output deleted.*
raiserror (''Oh doo doo'',15,15)') d
select (select count(1) from foo..Fred) fred
,(select COUNT(1) from fooArchive..FredArchive ) fredarchive
Before running this think what the result would be. I got it wrong.
What seems to happen is that the remote query is executed as a transaction, the error causes that to rollback. However the results have already been sent to the client and so get inserted into the
© SQL Blogcasts or respective owner