SQL query mixing aggregated results and single values

Posted by Paul Flowerdew on Stack Overflow See other posts from Stack Overflow or by Paul Flowerdew
Published on 2012-10-09T15:27:39Z Indexed on 2012/10/10 15:37 UTC
Read the original article Hit count: 391

Filed under:
|
|
|
|

I have a table with transactions. Each transaction has a transaction ID, and accounting period (AP), and a posting value (PV), as well as other fields. Some of the IDs are duplicated, usually because the transaction was done in error. To give an example, part of the table might look like:

ID    PV    AP  
123   100   2  
123   -100  5  

In this case the transaction was added in AP2 then removed in AP5.

Another example would be:

ID    PV    AP  
456   100   2  
456   -100  5  
456   100   8

In the first example, the problem is that if I am analyzing what was spent in AP2, there is a transaction in there which actually shouldn't be taken into account because it was taken out again in AP5. In the second example, the second two transactions shouldn't be taken into account because they cancel each other out.

I want to label as many transactions as possible which shouldn't be taken into account as erroneous. To identify these transactions, I want to find the ones with duplicate IDs whose PVs sum to zero (like ID 123 above) or transactions where the PV of the earliest one is equal to sum(PV), as in the second example. This second condition is what is causing me grief.

So far I have

SELECT *
FROM table
WHERE table.ID IN (SELECT table.ID
                    FROM table
                    GROUP BY table.ID
                    HAVING COUNT(*) > 1
                    AND (SUM(table.PV) = 0
                    OR SUM(table.PV) = <PV of first transaction in each group>))
ORDER BY table.ID;

The bit in chevrons is what I'm trying to do and I'm stuck. Can I do it like this or is there some other method I can use in SQL to do this?

Edit 1: Btw I forgot to say that I'm using SQL Compact 3.5, in case it matters.

Edit 2: I think the code snippet above is a bit misleading. I still want to mark out transactions with duplicate IDs where sum(PV) = 0, as in the first example. But where the PV of the earliest transaction = sum(PV), as in the second example, what I actually want is to keep the earliest transaction and mark out all the others with the same ID. Sorry if that caused confusion.

Edit 3: I've been playing with Clodoaldo's solution and have made some progress, but still can't get quite what I want. I'm trying to get the transactions I know for certain to be erroneous. Suppose the following transactions are also in the table:

ID     PV    AP  
789    100   2  
789    200   5  
789   -100   8

In this example sum(PV) <> 0 and the earliest PV <> sum(PV) so I don't want to mark any of these out.

If I modify Clodoaldo's query as follows:

    select t.*
    from 
    t
    left join (
        select id, min(ap) as ap, sum(pv) as sum_pv
        from t
        group by id
        having sum(pv) <> 0
    ) s on t.id = s.id and t.ap = s.ap and t.pv = s.sum_pv
     where s.id is null

This gives the result

 ID      PV     AP
123      100    2
123     -100    5
456     -100    5
456      100    8
789      100    3
789      200    5
789     -100    8

Whilst the first 4 transactions are ok (they would be marked out), the 789 transactions are also there, and I don't want them. But I can't figure out how to modify the query so that they're not included. Any ideas?

© Stack Overflow or respective owner

Related posts about sql

Related posts about select