I have a table storing millions of rows. It looks something like this:
Table_Docs
ID, Bigint (Identity col)
OutputFileID, int
Sequence, int
…(many other fields)
We find ourselves in a situation where the developer who designed it made the OutputFileID the clustered index. It is not unique. There can be thousands of records with this ID. It has no benefit to any processes using this table, so we plan to remove it.
The question, is what to change it to… I have two candidates, the ID identity column is a natural choice. However, we have a process which does a lot of update commands on this table, and it uses the Sequence to do so. The Sequence is non-unique. Most records only contain one, but about 20% can have two or more records with the same Sequence.
The INSERT app is a VB6 piece of crud throwing thousands insert commands at the table. The Inserted values are never in any particular order. So the Sequence of one insert may be 12345, and the next could be 12245. I know that this could cause SQL to move a lot of data to keep the clustered index in order. However, the Sequence of the inserts are generally close to being in order. All inserts would take place at the end of the clustered table. Eg: I have 5 million records with Sequence spanning 1 to 5 million. The INSERT app will be inserting sequence’s at the end of that range at any given time. Reordering of the data should be minimal (tens of thousands of records at most).
Now, the UPDATE app is our .NET star. It does all UPDATES on the Sequence column. “Update Table_Docs Set Feild1=This, Field2=That…WHERE Sequence =12345” – hundreds of thousands of these a day. The UPDATES are completely and totally, random, touching all points of the table.
All other processes are simply doing SELECT’s on this (Web pages). Regular indexes cover those.
So my question is, what’s better….a unique clustered index on the ID column, benefiting the INSERT app, or a non-unique clustered index on the Sequence, benefiting the UPDATE app?