SQL: Speed Improvement - Cluttered union query
Posted
by
vol7ron
on Stack Overflow
See other posts from Stack Overflow
or by vol7ron
Published on 2011-02-16T23:17:25Z
Indexed on
2011/02/16
23:25 UTC
Read the original article
Hit count: 387
SELECT * FROM (
SELECT a.user_id, a.f_name, a.l_name, b.user_id, b.f_name, b.l_name
FROM current_tbl a
INNER JOIN import_tbl b
ON ( a.user_id = b.user_id )
UNION
SELECT a.user_id, a.f_name, a.l_name, b.user_id, b.f_name, b.l_name
FROM current_tbl a
INNER JOIN import_tbl b
ON ( lower(a.f_name)=lower(b.f_name)
AND lower(a.l_name)=lower(b.l_name) )
) foo
--
UNION
--
SELECT a.user_id , a.f_name , a.l_name , '' , '' , ''
FROM current_tbl a
WHERE a.user_id NOT IN (
select user_id from(
SELECT a.user_id, a.f_name, a.l_name, b.user_id, b.f_name, b.l_name
FROM current_tbl a
INNER JOIN import_tbl b
ON ( a.user_id = b.user_id )
UNION
SELECT a.user_id, a.f_name, a.l_name, b.user_id, b.f_name, b.l_name
FROM current_tbl a
INNER JOIN import_tbl b
ON ( lower(a.f_name)=lower(b.f_name)
AND lower(a.l_name)=lower(b.l_name) )
) bar
)
ORDER BY user_id
Example of table population:
current_tbl:
-------------------------------
user_id | f_name | l_name
---------+----------+----------
A1 | Adam | Acorn
A2 | Beth | Berry
A3 | Calv | Chard
| |
import_tbl:
-------------------------------
user_id | f_name | l_name
---------+----------+----------
A1 | Adam | Acorn
A2 | Beth | Butcher <- last_name different
| |
Expected Output:
-----------------------------------------------------------------------
user_id1 | f_name1 | l_name1 | user_id2 | f_name2 | l_name2
----------+-----------+-----------+------------+-----------+-----------
A1 | Adam | Acorn | A1 | Adam | Acorn
A2 | Beth | Berry | A2 | Beth | Butcher
A3 | Calv | Chard | | |
Doing this method gets rid of conditions where the row would be:
A2 | Beth | Berry | A2 | Beth | Butcher
But it keeps the A3 row
I hope this makes sense and I haven't overly simplified it. This is a continuation question from my other question. The succession of these improvements has dropped the query down from ~32000ms to where it's at now ~1200ms - quite an improvement.
I supect I can optimize by using UNION ALL
in the subquery and of course the usual index optimizations, but I'm looking for the best SQL optimization. FYI this particular case is for PostgreSQL.
© Stack Overflow or respective owner