Aggregating a list of dates to start and end date
- by Joe Mako
I have a list of dates and IDs, and I would like to roll them up into periods of consucitutive dates, within each ID.
For a table with the columns "testid" and "pulldate" in a table called "data":
| A79 | 2010-06-02 |
| A79 | 2010-06-03 |
| A79 | 2010-06-04 |
| B72 | 2010-04-22 |
| B72 | 2010-06-03 |
| B72 | 2010-06-04 |
| C94 | 2010-04-09 |
| C94 | 2010-04-10 |
| C94 | 2010-04-11 |
| C94 | 2010-04-12 |
| C94 | 2010-04-13 |
| C94 | 2010-04-14 |
| C94 | 2010-06-02 |
| C94 | 2010-06-03 |
| C94 | 2010-06-04 |
I want to generate a table with the columns "testid", "group", "start_date", "end_date":
| A79 | 1 | 2010-06-02 | 2010-06-04 |
| B72 | 2 | 2010-04-22 | 2010-04-22 |
| B72 | 3 | 2010-06-03 | 2010-06-04 |
| C94 | 4 | 2010-04-09 | 2010-04-14 |
| C94 | 5 | 2010-06-02 | 2010-06-04 |
This is the the code I came up with:
SELECT t2.testid,
t2.group,
MIN(t2.pulldate) AS start_date,
MAX(t2.pulldate) AS end_date
FROM(SELECT t1.pulldate,
t1.testid,
SUM(t1.check) OVER (ORDER BY t1.testid,t1.pulldate) AS group
FROM(SELECT data.pulldate,
data.testid,
CASE
WHEN data.testid=LAG(data.testid,1)
OVER (ORDER BY data.testid,data.pulldate)
AND data.pulldate=date (LAG(data.pulldate,1)
OVER (PARTITION BY data.testid
ORDER BY data.pulldate)) + integer '1'
THEN 0
ELSE 1
END AS check
FROM data
ORDER BY data.testid, data.pulldate) AS t1) AS t2
GROUP BY t2.testid,t2.group
ORDER BY t2.group;
I use the use the LAG windowing function to compare each row to the previous, putting a 1 if I need to increment to start a new group, I then do a running sum of that column, and then aggregate to the combinations of "group" and "testid".
Is there a better way to accomplish my goal, or does this operation have a name?
I am using PostgreSQL 8.4