merging two tables, while applying aggregates on the duplicates (max,min and sum)
- by cloudraven
I have a table (let's call it log) with a few millions of records. Among the fields I have Id, Count, FirstHit, LastHit.
Id - The record id
Count - number of times this Id has been reported
FirstHit - earliest timestamp with which this Id was reported
LastHit - latest timestamp with which this Id was reported
This table only has one record for any given Id
Everyday I get into another table (let's call it feed) with around half a million records with these fields among many others:
Id
Timestamp - Entry date and time.
This table can have many records for the same id
What I want to do is to update log in the following way.
Count - log count value, plus the count() of records for that id found in feed
FirstHit - the earliest of the current value in log or the minimum value in feed for that id
LastHit - the latest of the current value in log or the maximum value in feed for that id.
It should be noticed that many of the ids in feed are already in log.
The simple thing that worked is to create a temporary table and insert into it the union of both as in
Select Id, Min(Timestamp) As FirstHit, MAX(Timestamp) as LastHit, Count(*) as Count FROM feed GROUP BY Id
UNION ALL
Select Id, FirstHit,LastHit,Count FROM log;
From that temporary table I do a select that aggregates Min(firsthit), max(lasthit) and sum(Count)
Select Id, Min(FirstHit),Max(LastHit),Sum(Count) FROM @temp GROUP BY Id;
and that gives me the end result. I could then delete everything from log and replace it with everything with temp, or craft an update for the common records and insert the new ones. However, I think both are highly inefficient.
Is there a more efficient way of doing this. Perhaps doing the update in place in the log table?