Hi,
I've been doing some research and testing on how to do fast random selection in MySQL. In the process I've faced some unexpected results and now I am not fully sure I know how ORDER BY RAND() really works.
I always thought that when you do ORDER BY RAND() on the table, MySQL adds a new column to the table which is filled with random values, then it sorts data by that column and then e.g. you take the above value which got there randomly. I've done lots of googling and testing and finally found that the query Jay offers in his blog is indeed the fastest solution:
SELECT * FROM Table T JOIN (SELECT CEIL(MAX(ID)*RAND()) AS ID FROM Table) AS x ON T.ID >= x.ID LIMIT 1;
While common ORDER BY RAND() takes 30-40 seconds on my test table, his query does the work in 0.1 seconds. He explains how this functions in the blog so I'll just skip this and finally move to the odd thing.
My table is a common table with a PRIMARY KEY id and other non-indexed stuff like username, age, etc. Here's the thing I am struggling to explain
SELECT * FROM table ORDER BY RAND() LIMIT 1; /*30-40 seconds*/
SELECT id FROM table ORDER BY RAND() LIMIT 1; /*0.25 seconds*/
SELECT id, username FROM table ORDER BY RAND() LIMIT 1; /*90 seconds*/
I was sort of expecting to see approximately the same time for all three queries since I am always sorting on a single column. But for some reason this didn't happen. Please let me know if you any ideas about this. I have a project where I need to do fast ORDER BY RAND() and personally I would prefer to use
SELECT id FROM table ORDER BY RAND() LIMIT 1;
SELECT * FROM table WHERE id=ID_FROM_PREVIOUS_QUERY LIMIT 1;
which, yes, is slower than Jay's method, however it is smaller and easier to understand. My queries are rather big ones with several JOINs and with WHERE clause and while Jay's method still works, the query grows really big and complex because I need to use all the JOINs and WHERE in the JOINed (called x in his query) sub request.
Thanks for your time!