A MERGE statement can fail, and incorrectly report a unique key violation when:
The target table uses a unique filtered index; and No key column of
the filtered index is updated; and A column from
the filtering condition is updated; and Transient key violations are possible Example Tables Say we have two tables, one that is
the target of a MERGE statement, and another that contains updates to be applied to
the target.
The target table contains three columns, an integer primary key, a single character alternate key, and a status code column. A filtered unique index exists on
the alternate key, but is only enforced where
the status code is ‘a’: CREATE TABLE #Target
(
pk integer NOT NULL,
ak character(1) NOT NULL,
status_code character(1) NOT NULL,
PRIMARY KEY (pk)
);
CREATE UNIQUE INDEX uq1
ON #Target (ak)
INCLUDE (status_code)
WHERE status_code = 'a';
The changes table contains just an integer primary key (to identify
the target row to change) and
the new status code:
CREATE TABLE #Changes
(
pk integer NOT NULL,
status_code character(1) NOT NULL,
PRIMARY KEY (pk)
);
Sample Data
The sample data for
the example is:
INSERT #Target
(pk, ak, status_code)
VALUES
(1, 'A', 'a'),
(2, 'B', 'a'),
(3, 'C', 'a'),
(4, 'A', 'd');
INSERT #Changes
(pk, status_code)
VALUES
(1, 'd'),
(4, 'a');
Target Changes
+-----------------------+ +------------------+
¦ pk ¦ ak ¦ status_code ¦ ¦ pk ¦ status_code ¦
¦----+----+-------------¦ ¦----+-------------¦
¦ 1 ¦ A ¦ a ¦ ¦ 1 ¦ d ¦
¦ 2 ¦ B ¦ a ¦ ¦ 4 ¦ a ¦
¦ 3 ¦ C ¦ a ¦ +------------------+
¦ 4 ¦ A ¦ d ¦
+-----------------------+
The target table’s alternate key (ak) column is unique, for rows where status_code = ‘a’. Applying
the changes to
the target will change row 1 from status ‘a’ to status ‘d’, and row 4 from status ‘d’ to status ‘a’.
The result of applying all
the changes will still satisfy
the filtered unique index, because
the ‘A’ in row 1 will be deleted from
the index and
the ‘A’ in row 4 will be added.
Merge Test One
Let’s now execute a MERGE statement to apply
the changes:
MERGE #Target AS t
USING #Changes AS c ON
c.pk = t.pk
WHEN MATCHED
AND c.status_code <> t.status_code
THEN UPDATE SET status_code = c.status_code;
The MERGE changes
the two target rows as expected.
The updated target table now contains:
+-----------------------+
¦ pk ¦ ak ¦ status_code ¦
¦----+----+-------------¦
¦ 1 ¦ A ¦ d ¦ <—changed from ‘a’
¦ 2 ¦ B ¦ a ¦
¦ 3 ¦ C ¦ a ¦
¦ 4 ¦ A ¦ a ¦ <—changed from ‘d’
+-----------------------+
Merge Test Two
Now let’s repopulate
the changes table to reverse
the updates we just performed:
TRUNCATE TABLE #Changes;
INSERT #Changes
(pk, status_code)
VALUES
(1, 'a'),
(4, 'd');
This will change row 1 back to status ‘a’ and row 4 back to status ‘d’. As a reminder,
the current state of
the tables is:
Target Changes
+-----------------------+ +------------------+
¦ pk ¦ ak ¦ status_code ¦ ¦ pk ¦ status_code ¦
¦----+----+-------------¦ ¦----+-------------¦
¦ 1 ¦ A ¦ d ¦ ¦ 1 ¦ a ¦
¦ 2 ¦ B ¦ a ¦ ¦ 4 ¦ d ¦
¦ 3 ¦ C ¦ a ¦ +------------------+
¦ 4 ¦ A ¦ a ¦
+-----------------------+
We execute
the same MERGE statement:
MERGE #Target AS t
USING #Changes AS c ON
c.pk = t.pk
WHEN MATCHED
AND c.status_code <> t.status_code
THEN UPDATE SET status_code = c.status_code;
However this time we receive
the following message:
Msg 2601, Level 14, State 1, Line 1
Cannot insert duplicate key row in object 'dbo.#Target' with unique index 'uq1'.
The duplicate key value is (A).
The statement has been terminated.
Applying
the changes using UPDATE
Let’s now rewrite
the MERGE to use UPDATE instead:
UPDATE t
SET status_code = c.status_code
FROM #Target AS t
JOIN #Changes AS c ON
t.pk = c.pk
WHERE
c.status_code <> t.status_code;
This query succeeds where
the MERGE failed.
The two rows are updated as expected:
+-----------------------+
¦ pk ¦ ak ¦ status_code ¦
¦----+----+-------------¦
¦ 1 ¦ A ¦ a ¦ <—changed back to ‘a’
¦ 2 ¦ B ¦ a ¦
¦ 3 ¦ C ¦ a ¦
¦ 4 ¦ A ¦ d ¦ <—changed back to ‘d’
+-----------------------+
What went wrong with
the MERGE?
In this test,
the MERGE query execution happens to apply
the changes in
the order of
the ‘pk’ column.
In test one, this was not a problem: row 1 is removed from
the unique filtered index by changing status_code from ‘a’ to ‘d’ before row 4 is added. At no point does
the table contain two rows where ak = ‘A’ and status_code = ‘a’.
In test two, however,
the first change was to change row 1 from status ‘d’ to status ‘a’. This change means there would be two rows in
the filtered unique index where ak = ‘A’ (both row 1 and row 4 meet
the index filtering criteria ‘status_code = a’).
The storage engine does not allow
the query processor to violate a unique key (unless IGNORE_DUP_KEY is ON, but that is a different story, and doesn’t apply to MERGE in any case). This strict rule applies regardless of
the fact that if all changes were applied, there would be no unique key violation (row 4 would eventually be changed from ‘a’ to ‘d’, removing it from
the filtered unique index, and resolving
the key violation).
Why it went wrong
The query optimizer usually detects when this sort of temporary uniqueness violation could occur, and builds a plan that avoids
the issue. I wrote about this a couple of years ago in my post Beware Sneaky Reads with Unique Indexes (you can read more about
the details on pages 495-497 of Microsoft SQL Server 2008 Internals or in Craig Freedman’s blog post on maintaining unique indexes). To summarize though,
the optimizer introduces Split, Filter, Sort, and Collapse operators into
the query plan to:
Split each row update into delete followed by an inserts
Filter out rows that would not change
the index (due to
the filter on
the index, or a non-updating update)
Sort
the resulting stream by index key, with deletes before inserts
Collapse delete/insert pairs on
the same index key back into an update
The effect of all this is that only net changes are applied to an index (as one or more insert, update, and/or delete operations). In this case,
the net effect is a single update of
the filtered unique index: changing
the row for ak = ‘A’ from pk = 4 to pk = 1. In case that is less than 100% clear, let’s look at
the operation in test two again:
Target Changes Result
+-----------------------+ +------------------+ +-----------------------+
¦ pk ¦ ak ¦ status_code ¦ ¦ pk ¦ status_code ¦ ¦ pk ¦ ak ¦ status_code ¦
¦----+----+-------------¦ ¦----+-------------¦ ¦----+----+-------------¦
¦ 1 ¦ A ¦ d ¦ ¦ 1 ¦ d ¦ ¦ 1 ¦ A ¦ a ¦
¦ 2 ¦ B ¦ a ¦ ¦ 4 ¦ a ¦ ¦ 2 ¦ B ¦ a ¦
¦ 3 ¦ C ¦ a ¦ +------------------+ ¦ 3 ¦ C ¦ a ¦
¦ 4 ¦ A ¦ a ¦ ¦ 4 ¦ A ¦ d ¦
+-----------------------+ +-----------------------+
From
the filtered index’s point of view (filtered for status_code = ‘a’ and shown in nonclustered index key order)
the overall effect of
the query is:
Before After
+---------+ +---------+
¦ pk ¦ ak ¦ ¦ pk ¦ ak ¦
¦----+----¦ ¦----+----¦
¦ 4 ¦ A ¦ ¦ 1 ¦ A ¦
¦ 2 ¦ B ¦ ¦ 2 ¦ B ¦
¦ 3 ¦ C ¦ ¦ 3 ¦ C ¦
+---------+ +---------+
The single net change there is a change of pk from 4 to 1 for
the nonclustered index entry ak = ‘A’. This is
the magic performed by
the split, sort, and collapse. Notice in particular how
the original changes to
the index key (on
the ‘ak’ column) have been transformed into an update of a non-key column (pk is included in
the nonclustered index). By not updating any nonclustered index keys, we are guaranteed to avoid transient key violations.
The Execution Plans
The estimated MERGE execution plan that produces
the incorrect key-violation error looks like this (click to enlarge in a new window):
The successful UPDATE execution plan is (click to enlarge in a new window):
The MERGE execution plan is a narrow (per-row) update.
The single Clustered Index Merge operator maintains both
the clustered index and
the filtered nonclustered index.
The UPDATE plan is a wide (per-index) update.
The clustered index is maintained first, then
the Split, Filter, Sort, Collapse sequence is applied before
the nonclustered index is separately maintained.
There is always a wide update plan for any query that modifies
the database.
The narrow form is a performance optimization where
the number of rows is expected to be relatively small, and is not available for all operations. One of
the operations that should disallow a narrow plan is maintaining a unique index where intermediate key violations could occur.
Workarounds
The MERGE can be made to work (producing a wide update plan with split, sort, and collapse) by:
Adding all columns referenced in
the filtered index’s WHERE clause to
the index key (INCLUDE is not sufficient); or
Executing
the query with trace flag 8790 set e.g. OPTION (QUERYTRACEON 8790).
Undocumented trace flag 8790 forces a wide update plan for any data-changing query (remember that a wide update plan is always possible). Either change will produce a successfully-executing wide update plan for
the MERGE that failed previously.
Conclusion
The optimizer fails to spot
the possibility of transient unique key violations with MERGE under
the conditions listed at
the start of this post. It incorrectly chooses a narrow plan for
the MERGE, which cannot provide
the protection of a split/sort/collapse sequence for
the nonclustered index maintenance.
The MERGE plan may fail at execution time depending on
the order in which rows are processed, and
the distribution of data in
the database. Worse, a previously solid MERGE query may suddenly start to fail unpredictably if a filtered unique index is added to
the merge target table at any point.
Connect
bug filed here
Tests performed on SQL Server 2012 SP1 CUI (build 11.0.3321) x64 Developer Edition
© 2012 Paul White – All Rights Reserved
Twitter: @SQL_Kiwi
Email:
[email protected]