Why is SQLite3 using covering indices instead of the indices I created?

Posted by Geoff on Stack Overflow See other posts from Stack Overflow or by Geoff
Published on 2012-10-05T03:34:40Z Indexed on 2012/10/05 3:37 UTC
Read the original article Hit count: 195

Filed under:
|
|

I have an extremely large database (contacts has ~3 billion entries, people has ~280 million entries, and the other tables have a negligible number of entries). Most other queries I've run are really fast. However, I've encountered a more complicated query that's really slow. I'm wondering if there's any way to speed this up.

First of all, here is my schema:

CREATE TABLE activities (id INTEGER PRIMARY KEY, name TEXT NOT NULL);
CREATE TABLE contacts (
        id INTEGER PRIMARY KEY,
        person1_id INTEGER NOT NULL,
        person2_id INTEGER NOT NULL,
        duration REAL NOT NULL, -- hours
        activity_id INTEGER NOT NULL
    --  FOREIGN_KEY(person1_id) REFERENCES people(id),
    --  FOREIGN_KEY(person2_id) REFERENCES people(id)
    );
CREATE TABLE people (
        id INTEGER PRIMARY KEY,
        state_id INTEGER NOT NULL,
        county_id INTEGER NOT NULL,
        age INTEGER NOT NULL,
        gender TEXT NOT NULL, -- M or F
        income INTEGER NOT NULL
    --  FOREIGN_KEY(state_id) REFERENCES states(id)
    );
CREATE TABLE states (
        id INTEGER PRIMARY KEY,
        name TEXT NOT NULL,
        abbreviation TEXT NOT NULL
    );
CREATE INDEX activities_name_index on activities(name);
CREATE INDEX contacts_activity_id_index on contacts(activity_id);
CREATE INDEX contacts_duration_index on contacts(duration);
CREATE INDEX contacts_person1_id_index on contacts(person1_id);
CREATE INDEX contacts_person2_id_index on contacts(person2_id);
CREATE INDEX people_age_index on people(age);
CREATE INDEX people_county_id_index on people(county_id);
CREATE INDEX people_gender_index on people(gender);
CREATE INDEX people_income_index on people(income);
CREATE INDEX people_state_id_index on people(state_id);
CREATE INDEX states_abbreviation_index on states(abbreviation);
CREATE INDEX states_name_index on states(name);

Note that I've created an index on every column in the database. I don't care about the size of the database; speed is all I care about.

Here's an example of a query that, as expected, runs almost instantly:

SELECT count(*) FROM people, states WHERE people.state_id=states.id and states.abbreviation='IA';

Here's the troublesome query:

SELECT * FROM contacts WHERE rowid IN
    (SELECT contacts.rowid FROM contacts, people, states
        WHERE contacts.person1_id=people.id AND people.state_id=states.id AND states.name='Kansas'
            INTERSECT
    SELECT contacts.rowid FROM contacts, people, states
        WHERE contacts.person2_id=people.id AND people.state_id=states.id AND states.name='Missouri');

Now, what I think would happen is that each subquery would use each relevant index I've created to speed this up. However, when I show the query plan, I see this:

sqlite> EXPLAIN QUERY PLAN SELECT * FROM contacts WHERE rowid IN (SELECT contacts.rowid FROM contacts, people, states WHERE contacts.person1_id=people.id AND people.state_id=states.id AND states.name='Kansas' INTERSECT SELECT contacts.rowid FROM contacts, people, states WHERE contacts.person2_id=people.id AND people.state_id=states.id AND states.name='Missouri');
0|0|0|SEARCH TABLE contacts USING INTEGER PRIMARY KEY (rowid=?) (~25 rows)
0|0|0|EXECUTE LIST SUBQUERY 1
2|0|2|SEARCH TABLE states USING COVERING INDEX states_name_index (name=?) (~1 rows)
2|1|1|SEARCH TABLE people USING COVERING INDEX people_state_id_index (state_id=?) (~5569556 rows)
2|2|0|SEARCH TABLE contacts USING COVERING INDEX contacts_person1_id_index (person1_id=?) (~12 rows)
3|0|2|SEARCH TABLE states USING COVERING INDEX states_name_index (name=?) (~1 rows)
3|1|1|SEARCH TABLE people USING COVERING INDEX people_state_id_index (state_id=?) (~5569556 rows)
3|2|0|SEARCH TABLE contacts USING COVERING INDEX contacts_person2_id_index (person2_id=?) (~12 rows)
1|0|0|COMPOUND SUBQUERIES 2 AND 3 USING TEMP B-TREE (INTERSECT)

In fact, if I show the query plan for the first query I posted, I get this:

sqlite> EXPLAIN QUERY PLAN SELECT count(*) FROM people, states WHERE people.state_id=states.id and states.abbreviation='IA';
0|0|1|SEARCH TABLE states USING COVERING INDEX states_abbreviation_index (abbreviation=?) (~1 rows)
0|1|0|SEARCH TABLE people USING COVERING INDEX people_state_id_index (state_id=?) (~5569556 rows)

Why is SQLite using covering indices instead of the indices I created? Shouldn't the search in the people table be able to happen in log(n) time given state_id which in turn is found in log(n) time?

© Stack Overflow or respective owner

Related posts about sql

Related posts about sqlite