I have two tables, Table A with 700,000 entries and Table B with 600,000 entries. The structure is as follows:
Table A:
+-----------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+---------------------+------+-----+---------+----------------+
| id | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
| number | bigint(20) unsigned | YES | | NULL | |
+-----------+---------------------+------+-----+---------+----------------+
Table B:
+-------------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------------+------+-----+---------+----------------+
| id | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
| number_s | bigint(20) unsigned | YES | MUL | NULL | |
| number_e | bigint(20) unsigned | YES | MUL | NULL | |
| source | varchar(50) | YES | | NULL | |
+-------------+---------------------+------+-----+---------+----------------+
I am trying to find if any of the values in Table A are present in Table B using the following code:
$sql = "SELECT number from TableA";
$result = mysql_query($sql) or die(mysql_error());
while($row = mysql_fetch_assoc($result)) {
$number = $row['number'];
$sql = "SELECT source, count(source) FROM TableB WHERE number_s < $number AND number_e > $number GROUP BY source";
$re = mysql_query($sql) or die(mysql_error);
while($ro = mysql_fetch_array($re)) {
echo $number."\t".$ro[0]."\t".$ro[1]."\n";
}
}
I was hoping that the query would go fast but then for some reason, it isn't terrible fast. My explain on the select (with a particular value of "number") gives me the following:
mysql> explain SELECT source, count(source) FROM TableB WHERE number_s < 1812194440 AND number_e > 1812194440 GROUP BY source;
+----+-------------+------------+------+-------------------------+------+---------+------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+-------------------------+------+---------+------+--------+----------------------------------------------+
| 1 | SIMPLE | TableB | ALL | number_s,number_e | NULL | NULL | NULL | 696325 | Using where; Using temporary; Using filesort |
+----+-------------+------------+------+-------------------------+------+---------+------+--------+----------------------------------------------+
1 row in set (0.00 sec)
Is there any optimization that I can squeeze out of this?
I tried writing a stored procedure for the same task but it doesn't even seem to work in the first place... It doesn't give any syntax errors... I tried running it for a day and it was still running which felt odd.
CREATE PROCEDURE Filter()
Begin
DECLARE number BIGINT UNSIGNED;
DECLARE x INT;
DECLARE done INT DEFAULT 0;
DECLARE cur1 CURSOR FOR SELECT number FROM TableA;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
CREATE TEMPORARY TABLE IF NOT EXISTS Flags(number bigint unsigned, count int(11));
OPEN cur1;
hist_loop: LOOP
FETCH cur1 INTO number;
SELECT count(*) from TableB WHERE number_s < number AND number_e > number INTO x;
IF done = 1 THEN
LEAVE hist_loop;
END IF;
IF x IS NOT NULL AND x>0 THEN
INSERT INTO Flags(number, count) VALUES(number, x);
END IF;
END LOOP hist_loop;
CLOSE cur1;
END