Select number of rows for each group where two column values makes one group
- by Fábio Antunes
I have a two select statements joined by UNION ALL. In the first statement a where clause gathers only rows that have been shown previously to the user. The second statement gathers all rows that haven't been shown to the user, therefore I end up with the viewed results first and non-viewed results after.
Of course this could simply be achieved with the same select statement using a simple ORDER BY, however the reason for two separate selects is simple after you realize what I hope to accomplish.
Consider the following structure and data.
+----+------+-----+--------+------+
| id | from | to | viewed | data |
+----+------+-----+--------+------+
| 1 | 1 | 10 | true | .... |
| 2 | 10 | 1 | true | .... |
| 3 | 1 | 10 | true | .... |
| 4 | 6 | 8 | true | .... |
| 5 | 1 | 10 | true | .... |
| 6 | 10 | 1 | true | .... |
| 7 | 8 | 6 | true | .... |
| 8 | 10 | 1 | true | .... |
| 9 | 6 | 8 | true | .... |
| 10 | 2 | 3 | true | .... |
| 11 | 1 | 10 | true | .... |
| 12 | 8 | 6 | true | .... |
| 13 | 10 | 1 | false | .... |
| 14 | 1 | 10 | false | .... |
| 15 | 6 | 8 | false | .... |
| 16 | 10 | 1 | false | .... |
| 17 | 8 | 6 | false | .... |
| 18 | 3 | 2 | false | .... |
+----+------+-----+--------+------+
Basically I wish all non viewed rows to be selected by the statement, that is accomplished by checking weather the viewed column is true or false, pretty simple and straightforward, nothing to worry here.
However when it comes to the rows already viewed, meaning the column viewed is TRUE, for those records I only want 3 rows to be returned for each group.
The appropriate result in this instance should be the 3 most recent rows of each group.
+----+------+-----+--------+------+
| id | from | to | viewed | data |
+----+------+-----+--------+------+
| 6 | 10 | 1 | true | .... |
| 7 | 8 | 6 | true | .... |
| 8 | 10 | 1 | true | .... |
| 9 | 6 | 8 | true | .... |
| 10 | 2 | 3 | true | .... |
| 11 | 1 | 10 | true | .... |
| 12 | 8 | 6 | true | .... |
+----+------+-----+--------+------+
As you see from the ideal result set we have three groups. Therefore the desired query for the viewed results should show a maximum of 3 rows for each grouping it finds. In this case these groupings were 10 with 1 and 8 with 6, both which had three rows to be shown, while the other group 2 with 3 only had one row to be shown.
Please note that where from = x and to = y, makes the same grouping as if it was from = y and to = x. Therefore considering the first grouping (10 with 1), from = 10 and to = 1 is the same group if it was from = 1 and to = 10.
However there are plenty of groups in the whole table that I only wish the 3 most recent of each to be returned in the select statement, and thats my problem, I not sure how that can be accomplished in the most efficient way possible considering the table will have hundreds if not thousands of records at some point.
Thanks for your help.
Note: The columns id, from, to and viewed are indexed, that should help with performance.
PS: I'm unsure on how to name this question exactly, if you have a better idea, be my guest and edit the title.