Efficiently select top row for each category in the set
- by VladV
I need to select a top row for each category from a known set (somewhat similar to this question). The problem is, how to make this query efficient on the large number of rows.
For example, let's create a table that stores temperature recording in several places.
CREATE TABLE #t (
placeId int,
ts datetime,
temp int,
PRIMARY KEY (ts, placeId)
)
-- insert some sample data
SET NOCOUNT ON
DECLARE @n int, @ts datetime
SELECT @n = 1000, @ts = '2000-01-01'
WHILE (@n>0) BEGIN
INSERT INTO #t VALUES (@n % 10, @ts, @n % 37)
IF (@n % 10 = 0) SET @ts = DATEADD(hour, 1, @ts)
SET @n = @n - 1
END
Now I need to get the latest recording for each of the places 1, 2, 3.
This way is efficient, but doesn't scale well (and looks dirty).
SELECT * FROM (
SELECT TOP 1 placeId, temp
FROM #t
WHERE placeId = 1
ORDER BY ts DESC
) t1
UNION ALL
SELECT * FROM (
SELECT TOP 1 placeId, temp
FROM #t
WHERE placeId = 2
ORDER BY ts DESC
) t2
UNION ALL
SELECT * FROM (
SELECT TOP 1 placeId, temp
FROM #t
WHERE placeId = 3
ORDER BY ts DESC
) t3
The following looks better but works much less efficiently (30% vs 70% according to the optimizer).
SELECT placeId, ts, temp FROM (
SELECT placeId, ts, temp, ROW_NUMBER() OVER (PARTITION BY placeId ORDER BY ts DESC) rownum
FROM #t
WHERE placeId IN (1, 2, 3)
) t
WHERE rownum = 1
The problem is, during the latter query execution plan a clustered index scan is performed on #t and 300 rows are retrieved, sorted, numbered, and then filtered, leaving only 3 rows. For the former query three times one row is fetched.
Is there a way to perform the query efficiently without lots of unions?