SQL -- How is DISTINCT so fast without an index?

Posted by Jonathan on Stack Overflow See other posts from Stack Overflow or by Jonathan
Published on 2010-12-27T23:43:48Z Indexed on 2010/12/27 23:54 UTC
Read the original article Hit count: 220

Filed under:

sql

|

database

|

sqlite

|

optimization

|

sqlite3

Hi,

I have a database with a table called 'links' with 600 million rows in it in SQLite. There are 2 columns in the database - a "src" column and a "dest" column. At present there are no indices.

There are a fair number of common values between src and dest, but also a fair number of duplicated rows.

The first thing I'm trying to do is remove all the duplicate rows, and then perform some additional processing on the results, however I've been encountering some weird issues.

Firstly, SELECT * FROM links WHERE src=434923 AND dest=5010182. Now this returns one result fairly quickly and then takes quite a long time to run as I assume it's performing a tablescan on the rest of the 600m rows.

However, if I do SELECT DISTINCT * FROM links, then it immediately starts returning rows really quickly. The question is: how is this possible?? Surely for each row, the row must be compared against all of the other rows in the table, but this would require a tablescan of the remaining rows in the table which SHOULD takes ages!

Any ideas why SELECT DISTINCT is so much quicker than a standard SELECT?

© Stack Overflow or respective owner

Related posts about sql

SQL SERVER – Concat Strings in SQL Server using T-SQL – SQL in Sixty Seconds #035 – Video

as seen on SQL Authority - Search for 'SQL Authority'
Concatenating string is one of the most common tasks in SQL Server and every developer has to come across it. We have to concat the string when we have to see the display full name of the person by first name and last name. In this video we will see various methods to concatenate the strings. SQL… >>> More
SQL SERVER – Concat Function in SQL Server – SQL Concatenation

as seen on SQL Authority - Search for 'SQL Authority'
Earlier this week, I was delivering Advanced BI training on the subject of “SQL Server 2008 R2″. I had great time delivering the session. During the session, we talked about SQL Server 2010 Denali. Suddenly one of the attendees suggested his displeasure for the product. He said, even though… >>> More
Error with SQL Server Setup 2012 on Windows 2012

as seen on Server Fault - Search for 'Server Fault'
I am trying to install SQL Server on Windows 2012. I was able to finally get the wizard up and running after making some changes on the server, but now it fails no matter what I do with the following error: TITLE: SQL Server Setup failure. SQL Server Setup has encountered the following error: … >>> More
How can I detect which version of SQL (eg SQL 2008 or SQL Azure)

as seen on Stack Overflow - Search for 'Stack Overflow'
I need to detect which version of SQL I am dealing with to perorm various tasks, I need specifically detect if I am on SQL 2008 or SQL Azure. How can I do this with detection code written in SQL? >>> More
Nested SQL Select statement fails on SQL Server 2000, ok on SQL Server 2005

as seen on Stack Overflow - Search for 'Stack Overflow'
Here is the query: INSERT INTO @TempTable SELECT UserID, Name, Address1 = (SELECT TOP 1 [Address] FROM (SELECT TOP 1 [Address] FROM [UserAddress] ua INNER JOIN UserAddressOrder uo ON ua.UserID = uo.UserID WHERE ua.UserID = u.UserID ORDER BY uo.AddressOrder ASC) q ORDER BY AddressOrder… >>> More

Related posts about database

SQL SERVER Retrieve and Explore Database Backup without Restoring Database Idera virtual database

as seen on Dot net Slackers - Search for 'Dot net Slackers'
I recently downloaded Ideras SQL virtual database, and tested it. There are a few things about this tool which caught my attention.My ScenarioIt is quite common in real life that sometimes observing or retrieving older data is necessary; however, it had changed as time passed by. The full database… >>> More
Cloning A Database On The Same Server Using Rman Duplicate From Active Database

as seen on Oracle Blogs - Search for 'Oracle Blogs'
To clone a database using Rman we used to require an existing Rman backup, on 11g we can clone databases using the "from active" database option. In this case we do not require an existing backup, the active datafiles will be used as the source for the clone. In order to clone with the source database… >>> More
cPickle ImportError: No module named multiarray

as seen on Stack Overflow - Search for 'Stack Overflow'
Hello, I'm using cPickle to save my Database into file. The code looks like that: def Save_DataBase(): import cPickle from scipy import * from numpy import * a=Results.VersionName #filename='D:/results/'+a[a.find('/')+1:-a.find('/')-2]+Results.AssType[:3]+str(random.randint(0,100))+Results.Distribution+"… >>> More
SQL SERVER – 2008 – Introduction to Snapshot Database – Restore From Snapshot

as seen on SQL Authority - Search for 'SQL Authority'
Snapshot database is one of the most interesting concepts that I have used at some places recently. Here is a quick definition of the subject from Book On Line: A Database Snapshot is a read-only, static view of a database (the source database). Multiple snapshots can exist on a source database and… >>> More
OTN ???? ?????? ???????

as seen on Oracle Blogs - Search for 'Oracle Blogs'
Database ?? Database ??????? Database ?????????? Java WebLogic Server/????????·???? SOA/BPM/????? ???????/???? ID??/?????? ?????EPM/BI EPM/BI ??????? EPM/BI ???? OS/??? ???? ????? MySQL Database ?? ???? ?? ????????? ??? ?? ORACLE MASTER… >>> More