Mulitple full joins in Postgres is slow
- by blast83
I have a program to use the IMDB database and am having very slow performance on my query. It appears that it doesn't use my where condition until after it materializes everything. I looked around for hints to use but nothing seems to work. Here is my query:
SELECT *
FROM name as n1
FULL JOIN aka_name
ON n1.id = aka_name.person_id
FULL JOIN cast_info as t2
ON n1.id = t2.person_id
FULL JOIN person_info as t3
ON n1.id = t3.person_id
FULL JOIN char_name as t4
ON t2.person_role_id = t4.id
FULL JOIN role_type as t5
ON t2.role_id = t5.id
FULL JOIN title as t6
ON t2.movie_id = t6.id
FULL JOIN aka_title as t7
ON t6.id = t7.movie_id
FULL JOIN complete_cast as t8
ON t6.id = t8.movie_id
FULL JOIN kind_type as t9
ON t6.kind_id = t9.id
FULL JOIN movie_companies as t10
ON t6.id = t10.movie_id
FULL JOIN movie_info as t11
ON t6.id = t11.movie_id
FULL JOIN movie_info_idx as t19
ON t6.id = t19.movie_id
FULL JOIN movie_keyword as t12
ON t6.id = t12.movie_id
FULL JOIN movie_link as t13
ON t6.id = t13.linked_movie_id
FULL JOIN link_type as t14
ON t13.link_type_id = t14.id
FULL JOIN keyword as t15
ON t12.keyword_id = t15.id
FULL JOIN company_name as t16
ON t10.company_id = t16.id
FULL JOIN company_type as t17
ON t10.company_type_id = t17.id
FULL JOIN comp_cast_type as t18
ON t8.status_id = t18.id
WHERE n1.id = 2003
Very table is related to each other on the join via foreign-key constraints and have indexes for all the mentioned columns.
The query plan details:
"Hash Left Join (cost=5838187.01..13756845.07 rows=15579622 width=835) (actual time=146879.213..146891.861 rows=20 loops=1)"
" Hash Cond: (t8.status_id = t18.id)"
" -> Hash Left Join (cost=5838185.92..13542624.18 rows=15579622 width=822) (actual time=146879.199..146891.833 rows=20 loops=1)"
" Hash Cond: (t10.company_type_id = t17.id)"
" -> Hash Left Join (cost=5838184.83..13328403.29 rows=15579622 width=797) (actual time=146879.165..146891.781 rows=20 loops=1)"
" Hash Cond: (t10.company_id = t16.id)"
" -> Hash Left Join (cost=5828372.95..10061752.03 rows=15579622 width=755) (actual time=146426.483..146429.756 rows=20 loops=1)"
" Hash Cond: (t12.keyword_id = t15.id)"
" -> Hash Left Join (cost=5825164.23..6914088.45 rows=15579622 width=731) (actual time=146372.411..146372.529 rows=20 loops=1)"
" Hash Cond: (t13.link_type_id = t14.id)"
" -> Merge Left Join (cost=5825162.82..6699867.24 rows=15579622 width=715) (actual time=146372.366..146372.472 rows=20 loops=1)"
" Merge Cond: (t6.id = t13.linked_movie_id)"
" -> Merge Left Join (cost=5684009.29..6378956.77 rows=15579622 width=699) (actual time=144019.620..144019.711 rows=20 loops=1)"
" Merge Cond: (t6.id = t12.movie_id)"
" -> Merge Left Join (cost=5182403.90..5622400.75 rows=8502523 width=687) (actual time=136849.731..136849.809 rows=20 loops=1)"
" Merge Cond: (t6.id = t19.movie_id)"
" -> Merge Left Join (cost=4974472.00..5315778.48 rows=8502523 width=637) (actual time=134972.032..134972.099 rows=20 loops=1)"
" Merge Cond: (t6.id = t11.movie_id)"
" -> Merge Left Join (cost=1830064.81..2033131.89 rows=1341632 width=561) (actual time=63784.035..63784.062 rows=2 loops=1)"
" Merge Cond: (t6.id = t10.movie_id)"
" -> Nested Loop Left Join (cost=1417360.29..1594294.02 rows=1044480 width=521) (actual time=59279.246..59279.264 rows=1 loops=1)"
" Join Filter: (t6.kind_id = t9.id)"
" -> Merge Left Join (cost=1417359.22..1429787.34 rows=1044480 width=507) (actual time=59279.222..59279.224 rows=1 loops=1)"
" Merge Cond: (t6.id = t8.movie_id)"
" -> Merge Left Join (cost=1405731.84..1414378.65 rows=1044480 width=491) (actual time=59121.773..59121.775 rows=1 loops=1)"
" Merge Cond: (t6.id = t7.movie_id)"
" -> Sort (cost=1346206.04..1348817.24 rows=1044480 width=416) (actual time=58095.230..58095.231 rows=1 loops=1)"
" Sort Key: t6.id"
" Sort Method: quicksort Memory: 17kB"
" -> Hash Left Join (cost=172406.29..456387.53 rows=1044480 width=416) (actual time=57969.371..58095.208 rows=1 loops=1)"
" Hash Cond: (t2.movie_id = t6.id)"
" -> Hash Left Join (cost=104700.38..256885.82 rows=1044480 width=358) (actual time=49981.493..50006.303 rows=1 loops=1)"
" Hash Cond: (t2.role_id = t5.id)"
" -> Hash Left Join (cost=104699.11..242522.95 rows=1044480 width=343) (actual time=49981.441..50006.250 rows=1 loops=1)"
" Hash Cond: (t2.person_role_id = t4.id)"
" -> Hash Left Join (cost=464.96..12283.95 rows=1044480 width=269) (actual time=0.071..0.087 rows=1 loops=1)"
" Hash Cond: (n1.id = t3.person_id)"
" -> Nested Loop Left Join (cost=0.00..49.39 rows=7680 width=160) (actual time=0.051..0.066 rows=1 loops=1)"
" -> Nested Loop Left Join (cost=0.00..17.04 rows=3 width=119) (actual time=0.038..0.041 rows=1 loops=1)"
" -> Index Scan using name_pkey on name n1 (cost=0.00..8.68 rows=1 width=39) (actual time=0.022..0.024 rows=1 loops=1)"
" Index Cond: (id = 2003)"
" -> Index Scan using aka_name_idx_person on aka_name (cost=0.00..8.34 rows=1 width=80) (actual time=0.010..0.010 rows=0 loops=1)"
" Index Cond: ((aka_name.person_id = 2003) AND (n1.id = aka_name.person_id))"
" -> Index Scan using cast_info_idx_pid on cast_info t2 (cost=0.00..10.77 rows=1 width=41) (actual time=0.011..0.020 rows=1 loops=1)"
" Index Cond: ((t2.person_id = 2003) AND (n1.id = t2.person_id))"
" -> Hash (cost=463.26..463.26 rows=136 width=109) (actual time=0.010..0.010 rows=0 loops=1)"
" -> Index Scan using person_info_idx_pid on person_info t3 (cost=0.00..463.26 rows=136 width=109) (actual time=0.009..0.009 rows=0 loops=1)"
" Index Cond: (person_id = 2003)"
" -> Hash (cost=42697.62..42697.62 rows=2442362 width=74) (actual time=49305.872..49305.872 rows=2442362 loops=1)"
" -> Seq Scan on char_name t4 (cost=0.00..42697.62 rows=2442362 width=74) (actual time=14.066..22775.087 rows=2442362 loops=1)"
" -> Hash (cost=1.12..1.12 rows=12 width=15) (actual time=0.024..0.024 rows=12 loops=1)"
" -> Seq Scan on role_type t5 (cost=0.00..1.12 rows=12 width=15) (actual time=0.012..0.014 rows=12 loops=1)"
" -> Hash (cost=31134.07..31134.07 rows=1573507 width=58) (actual time=7841.225..7841.225 rows=1573507 loops=1)"
" -> Seq Scan on title t6 (cost=0.00..31134.07 rows=1573507 width=58) (actual time=21.507..2799.443 rows=1573507 loops=1)"
" -> Materialize (cost=59525.80..63203.88 rows=294246 width=75) (actual time=812.376..984.958 rows=192075 loops=1)"
" -> Sort (cost=59525.80..60261.42 rows=294246 width=75) (actual time=812.363..922.452 rows=192075 loops=1)"
" Sort Key: t7.movie_id"
" Sort Method: external merge Disk: 24880kB"
" -> Seq Scan on aka_title t7 (cost=0.00..6646.46 rows=294246 width=75) (actual time=24.652..164.822 rows=294246 loops=1)"
" -> Materialize (cost=11627.38..12884.43 rows=100564 width=16) (actual time=123.819..149.086 rows=41907 loops=1)"
" -> Sort (cost=11627.38..11878.79 rows=100564 width=16) (actual time=123.807..138.530 rows=41907 loops=1)"
" Sort Key: t8.movie_id"
" Sort Method: external merge Disk: 3136kB"
" -> Seq Scan on complete_cast t8 (cost=0.00..1549.64 rows=100564 width=16) (actual time=0.013..10.744 rows=100564 loops=1)"
" -> Materialize (cost=1.08..1.15 rows=7 width=14) (actual time=0.016..0.029 rows=7 loops=1)"
" -> Seq Scan on kind_type t9 (cost=0.00..1.07 rows=7 width=14) (actual time=0.011..0.013 rows=7 loops=1)"
" -> Materialize (cost=412704.52..437969.09 rows=2021166 width=40) (actual time=3420.356..4278.545 rows=1028995 loops=1)"
" -> Sort (cost=412704.52..417757.43 rows=2021166 width=40) (actual time=3420.349..3953.483 rows=1028995 loops=1)"
" Sort Key: t10.movie_id"
" Sort Method: external merge Disk: 90960kB"
" -> Seq Scan on movie_companies t10 (cost=0.00..35214.66 rows=2021166 width=40) (actual time=13.271..566.893 rows=2021166 loops=1)"
" -> Materialize (cost=3144407.19..3269057.42 rows=9972019 width=76) (actual time=65485.672..70083.219 rows=5039009 loops=1)"
" -> Sort (cost=3144407.19..3169337.23 rows=9972019 width=76) (actual time=65485.667..68385.550 rows=5038999 loops=1)"
" Sort Key: t11.movie_id"
" Sort Method: external merge Disk: 735512kB"
" -> Seq Scan on movie_info t11 (cost=0.00..212815.19 rows=9972019 width=76) (actual time=15.750..15715.608 rows=9972019 loops=1)"
" -> Materialize (cost=207925.01..219867.92 rows=955433 width=50) (actual time=1483.989..1785.636 rows=429401 loops=1)"
" -> Sort (cost=207925.01..210313.59 rows=955433 width=50) (actual time=1483.983..1654.165 rows=429401 loops=1)"
" Sort Key: t19.movie_id"
" Sort Method: external merge Disk: 31720kB"
" -> Seq Scan on movie_info_idx t19 (cost=0.00..15047.33 rows=955433 width=50) (actual time=7.284..221.597 rows=955433 loops=1)"
" -> Materialize (cost=501605.39..537645.64 rows=2883220 width=12) (actual time=5823.040..6868.242 rows=1597396 loops=1)"
" -> Sort (cost=501605.39..508813.44 rows=2883220 width=12) (actual time=5823.026..6477.517 rows=1597396 loops=1)"
" Sort Key: t12.movie_id"
" Sort Method: external merge Disk: 78888kB"
" -> Seq Scan on movie_keyword t12 (cost=0.00..44417.20 rows=2883220 width=12) (actual time=11.672..839.498 rows=2883220 loops=1)"
" -> Materialize (cost=141143.93..152995.81 rows=948150 width=16) (actual time=1916.356..2253.004 rows=478358 loops=1)"
" -> Sort (cost=141143.93..143514.31 rows=948150 width=16) (actual time=1916.344..2125.698 rows=478358 loops=1)"
" Sort Key: t13.linked_movie_id"
" Sort Method: external merge Disk: 29632kB"
" -> Seq Scan on movie_link t13 (cost=0.00..14607.50 rows=948150 width=16) (actual time=27.610..297.962 rows=948150 loops=1)"
" -> Hash (cost=1.18..1.18 rows=18 width=16) (actual time=0.020..0.020 rows=18 loops=1)"
" -> Seq Scan on link_type t14 (cost=0.00..1.18 rows=18 width=16) (actual time=0.010..0.012 rows=18 loops=1)"
" -> Hash (cost=1537.10..1537.10 rows=91010 width=24) (actual time=54.055..54.055 rows=91010 loops=1)"
" -> Seq Scan on keyword t15 (cost=0.00..1537.10 rows=91010 width=24) (actual time=0.006..14.703 rows=91010 loops=1)"
" -> Hash (cost=4585.61..4585.61 rows=245461 width=42) (actual time=445.269..445.269 rows=245461 loops=1)"
" -> Seq Scan on company_name t16 (cost=0.00..4585.61 rows=245461 width=42) (actual time=12.037..309.961 rows=245461 loops=1)"
" -> Hash (cost=1.04..1.04 rows=4 width=25) (actual time=0.013..0.013 rows=4 loops=1)"
" -> Seq Scan on company_type t17 (cost=0.00..1.04 rows=4 width=25) (actual time=0.009..0.010 rows=4 loops=1)"
" -> Hash (cost=1.04..1.04 rows=4 width=13) (actual time=0.006..0.006 rows=4 loops=1)"
" -> Seq Scan on comp_cast_type t18 (cost=0.00..1.04 rows=4 width=13) (actual time=0.002..0.003 rows=4 loops=1)"
"Total runtime: 147055.016 ms"
Is there anyway to force the name.id = 2003 before it tries to join all the tables together? As you can see, the end result is 4 tuples but it seems like it should be a fast join by using the available index after it limited it down with the name clause, although very complex.