More efficient SQL than using "A UNION (B in A)"?
- by machinatus
Edit 1 (clarification): Thank you for the answers so far! The response is gratifying.
I want to clarify the question a little because based on the answers I think I did not describe one aspect of the problem correctly (and I'm sure that's my fault as I was having a difficult time defining it even for myself).
Here's the rub: The result set should contain ONLY the records with tstamp BETWEEN '2010-01-03' AND '2010-01-09', AND the one record where the tstamp IS NULL for each order_num in the first set (there will always be one with null tstamp for each order_num).
The answers given so far appear to include all records for a certain order_num if there are any with tstamp BETWEEN '2010-01-03' AND '2010-01-09'. For example, if there were another record with order_num = 2 and tstamp = 2010-01-12 00:00:00 it should not be included in the result.
Original question:
Consider an orders table containing id (unique), order_num, tstamp (a timestamp), and item_id (the single item included in an order). tstamp is null, unless the order has been modified, in which case there is another record with identical order_num and tstamp then contains the timestamp of when the change occurred.
Example...
id order_num tstamp item_id
__ _________ ___________________ _______
0 1 100
1 2 101
2 2 2010-01-05 12:34:56 102
3 3 113
4 4 124
5 5 135
6 5 2010-01-07 01:23:45 136
7 5 2010-01-07 02:46:00 137
8 6 100
9 6 2010-01-13 08:33:55 105
What is the most efficient SQL statement to retrieve all of the orders (based on order_num) which have been modified one or more times during a certain date range? In other words, for each order we need all of the records with the same order_num (including the one with NULL tstamp), for each order_num WHERE at least one of the order_num's has tstamp NOT NULL AND tstamp BETWEEN '2010-01-03' AND '2010-01-09'. It's the "WHERE at least one of the order_num's has tstamp NOT NULL" that I'm having difficulty with.
The result set should look like this:
id order_num tstamp item_id
__ _________ ___________________ _______
1 2 101
2 2 2010-01-05 12:34:56 102
5 5 135
6 5 2010-01-07 01:23:45 136
7 5 2010-01-07 02:46:00 137
The SQL that I came up with is this, which is essentially "A UNION (B in A)", but it executes slowly and I hope there is a more efficient solution:
SELECT history_orders.order_id, history_orders.tstamp, history_orders.item_id
FROM
(SELECT orders.order_id, orders.tstamp, orders.item_id
FROM orders
WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09')
AS history_orders
UNION
SELECT current_orders.order_id, current_orders.tstamp, current_orders.item_id
FROM
(SELECT orders.order_id, orders.tstamp, orders.item_id
FROM orders
WHERE orders.tstamp IS NULL)
AS current_orders
WHERE current_orders.order_id IN
(SELECT orders.order_id
FROM orders
WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09');