Semi-complex aggregate select statement confusion

Posted by Ian Henry on Stack Overflow See other posts from Stack Overflow or by Ian Henry
Published on 2010-03-08T18:10:59Z Indexed on 2010/03/08 18:36 UTC
Read the original article Hit count: 579

Filed under:
|
|

Alright, this problem is a little complicated, so bear with me.

I have a table full of data. One of the table columns is an EntryDate. There can be multiple entries per day. However, I want to select all rows that are the latest entry on their respective days, and I want to select all the columns of said table.

One of the columns is a unique identifier column, but it is not the primary key (I have no idea why it's there; this is a pretty old system). For purposes of demonstration, say the table looks like this:

create table ExampleTable (
    ID int identity(1,1) not null,
    PersonID int not null,
    StoreID int not null,
    Data1 int not null,
    Data2 int not null,
    EntryDate datetime not null
)

The primary key is on PersonID and StoreID, which logically defines uniqueness.

Now, like I said, I want to select all the rows that are the latest entries on that particular day (for each Person-Store combination). This is pretty easy:

--Figure 1
select PersonID, StoreID, max(EntryDate)
from ExampleTable
group by PersonID, StoreID, dbo.dayof(EntryDate)

Where dbo.dayof() is a simple function that strips the time component from a datetime. However, doing this loses the rest of the columns! I can't simply include the other columns, because then I'd have to group by them, which would produce the wrong results (especially since ID is unique).

I have found a dirty hack that will do what I want, but there must be a better way -- here's my current solution:

select
    cast(null as int) as ID,
    PersonID,
    StoreID,
    cast(null as int) as Data1,
    cast(null as int) as Data2,
    max(EntryDate) as EntryDate
into #StagingTable
from ExampleTable
group by PersonID, StoreID, dbo.dayof(EntryDate)

update Target set
    ID = Source.ID,
    Data1 = Source.Data1,
    Data2 = Source.Data2,
from #StagingTable as Target
inner join ExampleTable as Source
    on Source.PersonID = Target.PersonID
   and Source.StoreID = Target.StoreID
   and Source.EntryDate = Target.EntryDate

This gets me the correct data in #StagingTable but, well, look at it! Creating a table with null values, then doing an update to get the values back -- surely there's a better way to do this? A single statement that will get me all the values the first time?

It is my belief that the correct join on that original select (Figure 1) would do the trick, like a self-join or something... but how do you do that with the group by clause? I cannot find the right syntax to make the query execute.

I am pretty new with SQL, so it's likely that I'm missing something obvious. Any suggestions?

(Working in T-SQL, if it makes any difference)

© Stack Overflow or respective owner

Related posts about sql

Related posts about tsql