Speeding up a group by date query on a big table in postgres

Posted by zaius on Stack Overflow See other posts from Stack Overflow or by zaius
Published on 2011-01-13T00:33:14Z Indexed on 2011/01/13 0:54 UTC
Read the original article Hit count: 245

Filed under:

I've got a table with around 20 million rows. For arguments sake, lets say there are two columns in the table - an id and a timestamp. I'm trying to get a count of the number of items per day. Here's what I have at the moment.

  SELECT DATE(timestamp) AS day, COUNT(*)
    FROM actions
   WHERE DATE(timestamp) >= '20100101'
     AND DATE(timestamp) <  '20110101'
GROUP BY day;

Without any indices, this takes about a 30s to run on my machine. Here's the explain analyze output:

 GroupAggregate  (cost=675462.78..676813.42 rows=46532 width=8) (actual time=24467.404..32417.643 rows=346 loops=1)
   ->  Sort  (cost=675462.78..675680.34 rows=87021 width=8) (actual time=24466.730..29071.438 rows=17321121 loops=1)
         Sort Key: (date("timestamp"))
         Sort Method:  external merge  Disk: 372496kB
         ->  Seq Scan on actions  (cost=0.00..667133.11 rows=87021 width=8) (actual time=1.981..12368.186 rows=17321121 loops=1)
               Filter: ((date("timestamp") >= '2010-01-01'::date) AND (date("timestamp") < '2011-01-01'::date))
 Total runtime: 32447.762 ms

Since I'm seeing a sequential scan, I tried to index on the date aggregate

CREATE INDEX ON actions (DATE(timestamp));

Which cuts the speed by about 50%.

 HashAggregate  (cost=796710.64..796716.19 rows=370 width=8) (actual time=17038.503..17038.590 rows=346 loops=1)
   ->  Seq Scan on actions  (cost=0.00..710202.27 rows=17301674 width=8) (actual time=1.745..12080.877 rows=17321121 loops=1)
         Filter: ((date("timestamp") >= '2010-01-01'::date) AND (date("timestamp") < '2011-01-01'::date))
 Total runtime: 17038.663 ms

I'm new to this whole query-optimization business, and I have no idea what to do next. Any clues how I could get this query running faster?

Developer IT

Speeding up a group by date query on a big table in postgres - Developer IT

Speeding up a group by date query on a big table in postgres

sql

database

postgresql

database-queries

indexes

Related posts about sql

SQL SERVER – Concat Strings in SQL Server using T-SQL – SQL in Sixty Seconds #035 – Video

SQL SERVER – Concat Function in SQL Server – SQL Concatenation

Error with SQL Server Setup 2012 on Windows 2012

How can I detect which version of SQL (eg SQL 2008 or SQL Azure)

Nested SQL Select statement fails on SQL Server 2000, ok on SQL Server 2005

Related posts about database

SQL SERVER Retrieve and Explore Database Backup without Restoring Database Idera virtual database

Cloning A Database On The Same Server Using Rman Duplicate From Active Database

cPickle ImportError: No module named multiarray

SQL SERVER – 2008 – Introduction to Snapshot Database – Restore From Snapshot

OTN ???? ?????? ???????

Categories cloud