SQL: Speed Improvement - Cluttered union query

Posted by vol7ron on Stack Overflow See other posts from Stack Overflow or by vol7ron
Published on 2011-02-16T23:17:25Z Indexed on 2011/02/16 23:25 UTC
Read the original article Hit count: 380

SELECT * FROM (
    SELECT       a.user_id, a.f_name, a.l_name, b.user_id, b.f_name, b.l_name
    FROM         current_tbl a
    INNER JOIN   import_tbl  b 
                 ON ( a.user_id = b.user_id )
    UNION
    SELECT       a.user_id, a.f_name, a.l_name, b.user_id, b.f_name, b.l_name
    FROM         current_tbl a
    INNER JOIN   import_tbl  b 
                 ON (   lower(a.f_name)=lower(b.f_name) 
                    AND lower(a.l_name)=lower(b.l_name) ) 
) foo
--
UNION
--
SELECT a.user_id , a.f_name , a.l_name , '' , '' , '' 
FROM   current_tbl a
WHERE  a.user_id NOT IN (
   select user_id from(
      SELECT       a.user_id, a.f_name, a.l_name, b.user_id, b.f_name, b.l_name
      FROM         current_tbl a
      INNER JOIN   import_tbl  b 
                   ON ( a.user_id = b.user_id )
      UNION
      SELECT       a.user_id, a.f_name, a.l_name, b.user_id, b.f_name, b.l_name
      FROM         current_tbl a
      INNER JOIN   import_tbl  b 
                   ON (   lower(a.f_name)=lower(b.f_name) 
                      AND lower(a.l_name)=lower(b.l_name) ) 
   ) bar
)
ORDER BY user_id

Example of table population:

current_tbl:

-------------------------------
user_id  |  f_name  |  l_name
---------+----------+----------
  A1     |  Adam    |  Acorn
  A2     |  Beth    |  Berry
  A3     |  Calv    |  Chard
         |          |

import_tbl:

-------------------------------
user_id  |  f_name  |  l_name
---------+----------+----------
  A1     |  Adam    |  Acorn
  A2     |  Beth    |  Butcher  <- last_name different
         |          |

Expected Output:

-----------------------------------------------------------------------
user_id1  |  f_name1  |  l_name1  |  user_id2  |  f_name2  |  l_name2
----------+-----------+-----------+------------+-----------+-----------
   A1     |  Adam     |  Acorn    |     A1     |  Adam     |  Acorn       
   A2     |  Beth     |  Berry    |     A2     |  Beth     |  Butcher
   A3     |  Calv     |  Chard    |            |           |           

Doing this method gets rid of conditions where the row would be:

   A2     |  Beth     |  Berry    |     A2     |  Beth     |  Butcher

But it keeps the A3 row


I hope this makes sense and I haven't overly simplified it. This is a continuation question from my other question. The succession of these improvements has dropped the query down from ~32000ms to where it's at now ~1200ms - quite an improvement.

I supect I can optimize by using UNION ALL in the subquery and of course the usual index optimizations, but I'm looking for the best SQL optimization. FYI this particular case is for PostgreSQL.

© Stack Overflow or respective owner

Related posts about sql

Related posts about postgresql