SQL: Speed Improvement - Cluttered union query
- by vol7ron
SELECT * FROM (
SELECT a.user_id, a.f_name, a.l_name, b.user_id, b.f_name, b.l_name
FROM current_tbl a
INNER JOIN import_tbl b
ON ( a.user_id = b.user_id )
UNION
SELECT a.user_id, a.f_name, a.l_name, b.user_id, b.f_name, b.l_name
FROM current_tbl a
INNER JOIN import_tbl b
ON ( lower(a.f_name)=lower(b.f_name)
AND lower(a.l_name)=lower(b.l_name) )
) foo
--
UNION
--
SELECT a.user_id , a.f_name , a.l_name , '' , '' , ''
FROM current_tbl a
WHERE a.user_id NOT IN (
select user_id from(
SELECT a.user_id, a.f_name, a.l_name, b.user_id, b.f_name, b.l_name
FROM current_tbl a
INNER JOIN import_tbl b
ON ( a.user_id = b.user_id )
UNION
SELECT a.user_id, a.f_name, a.l_name, b.user_id, b.f_name, b.l_name
FROM current_tbl a
INNER JOIN import_tbl b
ON ( lower(a.f_name)=lower(b.f_name)
AND lower(a.l_name)=lower(b.l_name) )
) bar
)
ORDER BY user_id
Example of table population:
current_tbl:
-------------------------------
user_id | f_name | l_name
---------+----------+----------
A1 | Adam | Acorn
A2 | Beth | Berry
A3 | Calv | Chard
| |
import_tbl:
-------------------------------
user_id | f_name | l_name
---------+----------+----------
A1 | Adam | Acorn
A2 | Beth | Butcher <- last_name different
| |
Expected Output:
-----------------------------------------------------------------------
user_id1 | f_name1 | l_name1 | user_id2 | f_name2 | l_name2
----------+-----------+-----------+------------+-----------+-----------
A1 | Adam | Acorn | A1 | Adam | Acorn
A2 | Beth | Berry | A2 | Beth | Butcher
A3 | Calv | Chard | | |
Doing this method gets rid of conditions where the row would be:
A2 | Beth | Berry | A2 | Beth | Butcher
But it keeps the A3 row
I hope this makes sense and I haven't overly simplified it. This is a continuation question from my other question. The succession of these improvements has dropped the query down from ~32000ms to where it's at now ~1200ms - quite an improvement.
I supect I can optimize by using UNION ALL in the subquery and of course the usual index optimizations, but I'm looking for the best SQL optimization. FYI this particular case is for PostgreSQL.