Optimize MySQL query (ngrams, COUNT(), GROUP BY, ORDER BY)
- by Gerardo
I have a database with thousands of companies and their locations. I have implemented n-grams to optimize search. I am making one query to retrieve all the companies that match with the search query and another one to get a list with their locations and the number of companies in each location.
The query I am trying to optimize is the latter.
Maybe the problem is this:
Every company ('anunciante') has a field ('estado') to make logical deletes. So, if 'estado' equals 1, the company should be retrieved.
When I run the EXPLAIN command, it shows that it goes through almost 40k rows, when the actual result (the reality matching companies) are 80.
How can I optimize this?
This is my query (XXX represent the n-grams for the search query):
SELECT provincias.provincia AS provincia, provincias.id, COUNT(*) AS cantidad
FROM anunciantes
JOIN anunciante_invertido AS a_i0 ON anunciantes.id = a_i0.id_anunciante
JOIN indice_invertido AS indice0 ON a_i0.id_invertido = indice0.id
LEFT OUTER JOIN domicilios ON anunciantes.id = domicilios.id_anunciante
LEFT OUTER JOIN localidades ON domicilios.id_localidad = localidades.id
LEFT OUTER JOIN provincias ON provincias.id = localidades.id_provincia
WHERE anunciantes.estado = 1
AND indice0.id IN (SELECT invertido_ngrama.id_palabra FROM invertido_ngrama JOIN ngrama ON ngrama.id = invertido_ngrama.id_ngrama WHERE ngrama.ngrama = 'XXX')
AND indice0.id IN (SELECT invertido_ngrama.id_palabra FROM invertido_ngrama JOIN ngrama ON ngrama.id = invertido_ngrama.id_ngrama WHERE ngrama.ngrama = 'XXX')
AND indice0.id IN (SELECT invertido_ngrama.id_palabra FROM invertido_ngrama JOIN ngrama ON ngrama.id = invertido_ngrama.id_ngrama WHERE ngrama.ngrama = 'XXX')
AND indice0.id IN (SELECT invertido_ngrama.id_palabra FROM invertido_ngrama JOIN ngrama ON ngrama.id = invertido_ngrama.id_ngrama WHERE ngrama.ngrama = 'XXX')
AND indice0.id IN (SELECT invertido_ngrama.id_palabra FROM invertido_ngrama JOIN ngrama ON ngrama.id = invertido_ngrama.id_ngrama WHERE ngrama.ngrama = 'XXX')
GROUP BY provincias.id
ORDER BY cantidad DESC
And this is the query explained (hope it can be read in this format):
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY anunciantes ref PRIMARY,estado estado 1 const 36669 Using index; Using temporary; Using filesort
1 PRIMARY domicilios ref id_anunciante id_anunciante 4 db84771_viaempresas.anunciantes.id 1
1 PRIMARY localidades eq_ref PRIMARY PRIMARY 4 db84771_viaempresas.domicilios.id_localidad 1
1 PRIMARY provincias eq_ref PRIMARY PRIMARY 4 db84771_viaempresas.localidades.id_provincia 1
1 PRIMARY a_i0 ref PRIMARY,id_anunciante,id_invertido PRIMARY 4 db84771_viaempresas.anunciantes.id 1 Using where; Using index
1 PRIMARY indice0 eq_ref PRIMARY PRIMARY 4 db84771_viaempresas.a_i0.id_invertido 1 Using index
6 DEPENDENT SUBQUERY ngrama const PRIMARY,ngrama ngrama 5 const 1 Using index
6 DEPENDENT SUBQUERY invertido_ngrama eq_ref PRIMARY,id_palabra,id_ngrama PRIMARY 8 func,const 1 Using index
5 DEPENDENT SUBQUERY ngrama const PRIMARY,ngrama ngrama 5 const 1 Using index
5 DEPENDENT SUBQUERY invertido_ngrama eq_ref PRIMARY,id_palabra,id_ngrama PRIMARY 8 func,const 1 Using index
4 DEPENDENT SUBQUERY ngrama const PRIMARY,ngrama ngrama 5 const 1 Using index
4 DEPENDENT SUBQUERY invertido_ngrama eq_ref PRIMARY,id_palabra,id_ngrama PRIMARY 8 func,const 1 Using index
3 DEPENDENT SUBQUERY ngrama const PRIMARY,ngrama ngrama 5 const 1 Using index
3 DEPENDENT SUBQUERY invertido_ngrama eq_ref PRIMARY,id_palabra,id_ngrama PRIMARY 8 func,const 1 Using index
2 DEPENDENT SUBQUERY ngrama const PRIMARY,ngrama ngrama 5 const 1 Using index
2 DEPENDENT SUBQUERY invertido_ngrama eq_ref PRIMARY,id_palabra,id_ngrama PRIMARY 8 func,const 1 Using index