Recently I came across with an article on DB2 about using
Union instead of OR. So I thought of carrying out a research on SQL Server on what scenarios UNION is optimal in and which scenarios OR would be best.
I will analyze this with a few scenarios using samples taken from the AdventureWorks database Sales.SalesOrderDetail table.
Scenario 1: Selecting all columns
So we are going to select all columns and you have a
non-clustered index on the ProductID column.
--Query 1 : OR
SELECT * FROM Sales.SalesOrderDetail
WHERE ProductID = 714 OR ProductID =709
OR ProductID =998 OR ProductID =875
OR ProductID =976 OR ProductID =874
--Query 2 : UNION
SELECT * FROM Sales.SalesOrderDetail
WHERE ProductID = 714
UNION
SELECT * FROM Sales.SalesOrderDetail
WHERE ProductID = 709
UNION
SELECT * FROM Sales.SalesOrderDetail
WHERE ProductID = 998
UNION
SELECT * FROM Sales.SalesOrderDetail
WHERE ProductID = 875
UNION
SELECT * FROM Sales.SalesOrderDetail
WHERE ProductID = 976
UNION
SELECT * FROM Sales.SalesOrderDetail
WHERE ProductID = 874
So query 1 is using OR and the later is using UNION.
Let us analyze the execution plans for these queries.
Query 1
Query 2
As expected Query 1 will use Clustered Index Scan but Query
2, uses all sorts of things. In this case, since it is using multiple
CPUs you might have CX_PACKET waits as well.
Let’s look at the profiler results for these two queries:
CPU
Reads
Duration
Row Counts
OR
78
1252
389
3854
UNION
250
7495
660
3854
You can see from the above table the UNION query is not
performing well as the OR query though both are retuning same no of rows (3854).These results indicate that, for the above scenario UNION should be used.
Scenario 2: Non-Clustered and Clustered Index Columns
only
--Query 1 : OR
SELECT ProductID,SalesOrderID, SalesOrderDetailID FROM Sales.SalesOrderDetail
WHERE ProductID = 714 OR ProductID =709
OR ProductID =998 OR ProductID =875
OR ProductID =976 OR ProductID =874
GO
--Query 2 : UNION
SELECT ProductID,SalesOrderID, SalesOrderDetailID FROM Sales.SalesOrderDetail
WHERE ProductID = 714
UNION
SELECT ProductID,SalesOrderID, SalesOrderDetailID FROM Sales.SalesOrderDetail
WHERE ProductID = 709
UNION
SELECT ProductID,SalesOrderID, SalesOrderDetailID FROM Sales.SalesOrderDetail
WHERE ProductID = 998
UNION
SELECT ProductID,SalesOrderID, SalesOrderDetailID FROM Sales.SalesOrderDetail
WHERE ProductID = 875
UNION
SELECT ProductID,SalesOrderID, SalesOrderDetailID FROM Sales.SalesOrderDetail
WHERE ProductID = 976
UNION
SELECT ProductID,SalesOrderID, SalesOrderDetailID FROM Sales.SalesOrderDetail
WHERE ProductID = 874
GO
So this time, we will be selecting only index columns, which
means these queries will avoid a data page lookup.
As in the previous case we will analyze the execution plans:
Query 1
Query 2
Again, Query 2 is more complex than Query 1. Let us look at the profile analysis:
CPU
Reads
Duration
Row
Counts
OR
0
24
208
3854
UNION
0
38
193
3854
In this analyzis, there is only slight difference between OR and UNION.
Scenario 3: Selecting all columns for different fields
Up to now, we were using only one column (ProductID)
in the where clause. What if we have two columns for where clauses and let us
assume both are covered by non-clustered indexes?
--Query 1 : OR
SELECT *
FROM Sales.SalesOrderDetail
WHERE ProductID = 714
OR CarrierTrackingNumber LIKE 'D0B8%'
--Query 2 : UNION
SELECT *
FROM Sales.SalesOrderDetail
WHERE ProductID = 714
UNION
SELECT *
FROM Sales.SalesOrderDetail
WHERE CarrierTrackingNumber
LIKE 'D0B8%'
Query 1
Query 2:
As we can see, the query plan for the second query has improved. Let us see the profiler results.
CPU
Reads
Duration
Row
Counts
OR
47
1278
443
1228
UNION
31
1334
400
1228
So in this case too, there is little difference between OR and UNION.
Scenario 4: Selecting Clustered index columns for
different fields
Now let us go only with clustered indexes:
--Query 1 : OR
SELECT *
FROM Sales.SalesOrderDetail
WHERE ProductID = 714
OR CarrierTrackingNumber LIKE 'D0B8%'
--Query 2 : UNION
SELECT *
FROM Sales.SalesOrderDetail
WHERE ProductID = 714
UNION
SELECT *
FROM Sales.SalesOrderDetail
WHERE CarrierTrackingNumber
LIKE 'D0B8%'
Query 1
Query 2
Now both execution plans are almost identical except
is an additional Stream Aggregate is used in the first query. This means UNION
has advantage over OR in this scenario. Let us see profiler results for these
queries again.
CPU
Reads
Duration
Row
Counts
OR
0
319
366
1228
UNION
0
50
193
1228
Now see the differences, in this scenario UNION has somewhat of an
advantage over OR.
Conclusion
Using UNION or OR
depends on the scenario you are faced with. So you need to do your analyzing
before selecting the appropriate method. Also, above the four scenarios are not all
an exhaustive list of scenarios, I selected those for the broad description purposes
only.