Basically my problem is that I have a large table of about 17,000,000 products that I need to apply a bunch of updates to really quickly.
The table has 30 columns with the id set as int(10) AUTO_INCREMENT.
I have another table which all of the updates for this table are stored in, these updates have to be pre-calculated as they take a couple of days to calculate. This table is in the format of [ product_id int(10), update_value int(10) ].
The strategy I'm taking to issue these 17 million updates quickly is to load all of these updates into memory in a ruby script and group them in a hash of arrays so that each update_value is a key and each array is a list of sorted product_id's.
{
150: => [1,2,3,4,5,6],
160: => [7,8,9,10]
}
Updates are then issued in the format of
UPDATE product SET update_value = 150 WHERE product_id IN (1,2,3,4,5,6);
UPDATE product SET update_value = 160 WHERE product_id IN (7,8,9,10);
I'm pretty sure I'm doing this correctly in the sense that issuing the updates on sorted batches of product_id's should be the optimal way to do it with mysql / innodb.
I'm hitting a weird issue though where when I was testing with updating ~13 million records, this only took around 45 minutes. Now I'm testing with more data, ~17 million records and the updates are taking closer to 120 minutes. I would have expected some sort of speed decrease here but not to the degree that I'm seeing.
Any advice on how I can speed this up or what could be slowing me down with this larger record set?
As far as server specs go they're pretty good, heaps of memory / cpu, the whole DB should fit into memory with plenty of room to grow.