merging two tables, while applying aggregates on the duplicates (max,min and sum)
        Posted  
        
            by 
                cloudraven
            
        on Stack Overflow
        
        See other posts from Stack Overflow
        
            or by cloudraven
        
        
        
        Published on 2014-06-05T02:10:48Z
        Indexed on 
            2014/06/05
            3:26 UTC
        
        
        Read the original article
        Hit count: 256
        
I have a table (let's call it log) with a few millions of records. Among the fields I have Id, Count, FirstHit, LastHit.
- Id - The record id
- Count - number of times this Id has been reported
- FirstHit - earliest timestamp with which this Id was reported
- LastHit - latest timestamp with which this Id was reported
This table only has one record for any given Id
Everyday I get into another table (let's call it feed) with around half a million records with these fields among many others:
- Id
- Timestamp - Entry date and time.
This table can have many records for the same id
What I want to do is to update log in the following way. Count - log count value, plus the count() of records for that id found in feed FirstHit - the earliest of the current value in log or the minimum value in feed for that id LastHit - the latest of the current value in log or the maximum value in feed for that id.
It should be noticed that many of the ids in feed are already in log.
The simple thing that worked is to create a temporary table and insert into it the union of both as in
Select Id, Min(Timestamp) As FirstHit, MAX(Timestamp) as LastHit, Count(*) as Count FROM feed GROUP BY Id
UNION ALL
Select Id, FirstHit,LastHit,Count FROM log;
From that temporary table I do a select that aggregates Min(firsthit), max(lasthit) and sum(Count)
Select Id, Min(FirstHit),Max(LastHit),Sum(Count) FROM @temp GROUP BY Id;
and that gives me the end result. I could then delete everything from log and replace it with everything with temp, or craft an update for the common records and insert the new ones. However, I think both are highly inefficient.
Is there a more efficient way of doing this. Perhaps doing the update in place in the log table?
© Stack Overflow or respective owner