Semi-complex aggregate select statement confusion
- by Ian Henry
Alright, this problem is a little complicated, so bear with me.
I have a table full of data. One of the table columns is an EntryDate. There can be multiple entries per day. However, I want to select all rows that are the latest entry on their respective days, and I want to select all the columns of said table.
One of the columns is a unique identifier column, but it is not the primary key (I have no idea why it's there; this is a pretty old system). For purposes of demonstration, say the table looks like this:
create table ExampleTable (
ID int identity(1,1) not null,
PersonID int not null,
StoreID int not null,
Data1 int not null,
Data2 int not null,
EntryDate datetime not null
)
The primary key is on PersonID and StoreID, which logically defines uniqueness.
Now, like I said, I want to select all the rows that are the latest entries on that particular day (for each Person-Store combination). This is pretty easy:
--Figure 1
select PersonID, StoreID, max(EntryDate)
from ExampleTable
group by PersonID, StoreID, dbo.dayof(EntryDate)
Where dbo.dayof() is a simple function that strips the time component from a datetime. However, doing this loses the rest of the columns! I can't simply include the other columns, because then I'd have to group by them, which would produce the wrong results (especially since ID is unique).
I have found a dirty hack that will do what I want, but there must be a better way -- here's my current solution:
select
cast(null as int) as ID,
PersonID,
StoreID,
cast(null as int) as Data1,
cast(null as int) as Data2,
max(EntryDate) as EntryDate
into #StagingTable
from ExampleTable
group by PersonID, StoreID, dbo.dayof(EntryDate)
update Target set
ID = Source.ID,
Data1 = Source.Data1,
Data2 = Source.Data2,
from #StagingTable as Target
inner join ExampleTable as Source
on Source.PersonID = Target.PersonID
and Source.StoreID = Target.StoreID
and Source.EntryDate = Target.EntryDate
This gets me the correct data in #StagingTable but, well, look at it! Creating a table with null values, then doing an update to get the values back -- surely there's a better way to do this? A single statement that will get me all the values the first time?
It is my belief that the correct join on that original select (Figure 1) would do the trick, like a self-join or something... but how do you do that with the group by clause? I cannot find the right syntax to make the query execute.
I am pretty new with SQL, so it's likely that I'm missing something obvious. Any suggestions?
(Working in T-SQL, if it makes any difference)