SQL query - choosing 'last updated' record in a group, better db design?

Posted by Jimmy on Stack Overflow See other posts from Stack Overflow or by Jimmy
Published on 2010-05-16T18:32:42Z Indexed on 2010/05/16 18:40 UTC
Read the original article Hit count: 236

Filed under:
|

Hi,

Let's say I have a MySQL database with 3 tables:

table 1: Persons, with 1 column ID (int) table 2: Newsletters, with 1 column ID (int) table 3: Subscriptions, with columns Person_ID (int), Newsletter_ID (int), Subscribed (bool), Updated (Datetime)

Subscriptions.Person_ID points to a Person, and Subscription.Newsletter_ID points to a Newsletter. Thus, each person may have 0 or more subscriptions to 0 or more magazines at once. The table Subscriptions will also store the entire history of each person's subscriptions to each newsletter. If a particular Person_ID-Newsletter_ID pair doesn't have a row in the Subscriptions table, then it's equivalent to that pair having a subscription status of 'false'.

Here is a sample dataset

Persons
ID
1
2
3

Newsletters
ID
4
5
6

Subscriptions
Person_ID  Newsletter_ID  Subscribed  Updated
2                4           true     2010-05-01
3                4           true     2010-05-01
3                5           true     2010-05-10
3                4           false    2010-05-15

Thus, as of 2010-05-16, Person 1 has no subscription, Person 2 has a subscription to Newsletter 4, and Person 3 has a subscription to Newsletter 5. Person 3 had a subscription to Newsletter 4 for a while, but not anymore.

I'm trying to do 2 kinds of query.

  1. A query that shows everyone's active subscriptions as of query time (we can assume that updated will never be in the future -- thus, this means returning the record with the latest 'updated' value for each Person_ID-Newsletter_ID pair, as long as Subscribed is true (if the latest record for a Person_ID-Newsletter_ID pair has a Subscribed status of false, then I don't want that record returned)).

  2. A query that returns all active subscriptions for a specific newsletter - same qualification as in 1. regarding records with 'false' in the Subscribed column.

I don't use SQL/databases often enough to tell if this design is good, or if the SQL queries needed would be slow on a database with, say, 1M records in the Subscriptions table.

I was using the Visual query builder tool in Visual Studio 2010 but I can't even get the query to return the latest updated record for each Person_ID-Newsletter_ID pair.

Is it possible to come up with SQL queries that don't involve using subqueries (presumably because they would become too slow with a larger data set)? If not, would it be a better design to have a separate Subscriptions_History table, and every time a subscription status for a Person_ID-Newsletter-ID pair is added to Subscriptions, any existing record for that pair is moved to Subscriptions_History (that way the Subscriptions table only ever contains the latest status update for any Person_ID-Newsletter_ID pair)?

I'm using .net on Windows, so would it be easier (or the same, or harder) to do this kind of queries using Linq? Entity Framework?

Thanks!

© Stack Overflow or respective owner

Related posts about sql

Related posts about mysql