SQL -- How is DISTINCT so fast without an index?
- by Jonathan
Hi,
I have a database with a table called 'links' with 600 million rows in it in SQLite. There are 2 columns in the database - a "src" column and a "dest" column. At present there are no indices.
There are a fair number of common values between src and dest, but also a fair number of duplicated rows.
The first thing I'm trying to do is remove all the duplicate rows, and then perform some additional processing on the results, however I've been encountering some weird issues.
Firstly, SELECT * FROM links WHERE src=434923 AND dest=5010182. Now this returns one result fairly quickly and then takes quite a long time to run as I assume it's performing a tablescan on the rest of the 600m rows.
However, if I do SELECT DISTINCT * FROM links, then it immediately starts returning rows really quickly. The question is: how is this possible?? Surely for each row, the row must be compared against all of the other rows in the table, but this would require a tablescan of the remaining rows in the table which SHOULD takes ages!
Any ideas why SELECT DISTINCT is so much quicker than a standard SELECT?