How to list all duplicated rows which may include NULL columns?
- by Yousui
Hi guys,
I have a problem of listing duplicated rows that include NULL columns. Lemme show my problem first.
USE [tempdb];
GO
IF OBJECT_ID(N'dbo.t') IS NOT NULL
BEGIN
DROP TABLE dbo.t
END
GO
CREATE TABLE dbo.t
(
a NVARCHAR(8),
b NVARCHAR(8)
);
GO
INSERT t VALUES ('a', 'b');
INSERT t VALUES ('a', 'b');
INSERT t VALUES ('a', 'b');
INSERT t VALUES ('c', 'd');
INSERT t VALUES ('c', 'd');
INSERT t VALUES ('c', 'd');
INSERT t VALUES ('c', 'd');
INSERT t VALUES ('e', NULL);
INSERT t VALUES (NULL, NULL);
INSERT t VALUES (NULL, NULL);
INSERT t VALUES (NULL, NULL);
INSERT t VALUES (NULL, NULL);
GO
Now I want to show all rows that have other rows duplicated with them, I use the following query.
SELECT a, b
FROM dbo.t
GROUP
BY a, b
HAVING count(*) > 1
which will give us the result:
a b
-------- --------
NULL NULL
a b
c d
Now if I want to list all rows that make contribution to duplication, I use this query:
WITH
duplicate (a, b) AS
(
SELECT a, b
FROM dbo.t
GROUP
BY a, b
HAVING count(*) > 1
)
SELECT dbo.t.a, dbo.t.b
FROM dbo.t
INNER JOIN duplicate
ON (dbo.t.a = duplicate.a
AND dbo.t.b = duplicate.b)
Which will give me the result:
a b
-------- --------
a b
a b
a b
c d
c d
c d
c d
As you can see, all rows include NULLs are filtered. The reason I thought is that I use equal sign to test the condition(dbo.t.a = duplicate.a AND dbo.t.b = duplicate.b), and NULLs cannot be compared use equal sign. So, in order to include rows that include NULLs in it in the last result, I have change the aforementioned query to
WITH
duplicate (a, b) AS
(
SELECT a, b
FROM dbo.t
GROUP
BY a, b
HAVING count(*) > 1
)
SELECT dbo.t.a, dbo.t.b
FROM dbo.t
INNER JOIN duplicate
ON (dbo.t.a = duplicate.a
AND dbo.t.b = duplicate.b)
OR
(dbo.t.a IS NULL
AND duplicate.a IS NULL
AND dbo.t.b = duplicate.b)
OR
(dbo.t.b IS NULL
AND duplicate.b IS NULL
AND dbo.t.a = duplicate.a)
OR
(dbo.t.a IS NULL
AND duplicate.a IS NULL
AND dbo.t.b IS NULL
AND duplicate.b IS NULL)
And this query will give me the answer as I wanted:
a b
-------- --------
NULL NULL
NULL NULL
NULL NULL
NULL NULL
a b
a b
a b
c d
c d
c d
c d
Now my question is, as you can see, this query just include two columns, in order to include NULLs in the last result, you have to use many condition testing statements in the query. As the column number increasing, the condition testing statements you need in your query is increasing astonishingly. How can I solve this problem?
Great thanks.