mysql - filtering a list against keywords, both list and keywords > 20 million records
- by threecheeseopera
I have two tables, both having more than 20 million records; table1 is a list of terms, and table2 is a list of keywords that may or may not appear in those terms. I need to identify the terms that contain a keyword.
My current strategy is:
SELECT table1.term, table2.keyword FROM table1 INNER JOIN table2 ON table1.term
LIKE CONCAT('%', table2.keyword, '%');
This is not working, it takes f o r e v e r.
It's not the server (see notes).
How might I rewrite this so that it runs in under a day?
Notes:
As for server optimization: both tables are myisam and have unique indexes on the matching fields; the myisam key buffer is greater than the sum of both index file sizes, and it is not even being fully taxed (key_blocks_unused is ... large); the server is a dual-xeon 2U beast with fast sas drives and 8G of ram, fine-tuned for the mysql workload.