Beginner SQL question: querying gold and silver tag badges in Stack Exchange Data Explorer
- by polygenelubricants
I'm using the Stack Exchange Data Explorer to learn SQL, but I think the fundamentals of the question is applicable to other databases.
I'm trying to query the Badges table, which according to Stexdex (that's what I'm going to call it from now on) has the following schema:
Badges
Id
UserId
Name
Date
This works well for badges like [Epic] and [Legendary] which have unique names, but the silver and gold tag-specific badges seems to be mixed in together by having the same exact name.
Here's an example query I wrote for [mysql] tag:
SELECT
UserId as [User Link],
Date
FROM
Badges
Where
Name = 'mysql'
Order By
Date ASC
The (slightly annotated) output is: as seen on stexdex:
User Link Date
--------------- ------------------- // all for silver except where noted
Bill Karwin 2009-02-20 11:00:25
Quassnoi 2009-06-01 10:00:16
Greg 2009-10-22 10:00:25
Quassnoi 2009-10-31 10:00:24 // for gold
Bill Karwin 2009-11-23 11:00:30 // for gold
cletus 2010-01-01 11:00:23
OMG Ponies 2010-01-03 11:00:48
Pascal MARTIN 2010-02-17 11:00:29
Mark Byers 2010-04-07 10:00:35
Daniel Vassallo 2010-05-14 10:00:38
This is consistent with the current list of silver and gold earners at the moment of this writing, but to speak in more timeless terms, as of the end of May 2010 only 2 users have earned the gold [mysql] tag: Quassnoi and Bill Karwin, as evidenced in the above result by their names being the only ones that appear twice.
So this is the way I understand it:
The first time an Id appears (in chronological order) is for the silver badge
The second time is for the gold
Now, the above result mixes the silver and gold entries together. My questions are:
Is this a typical design, or are there much friendlier schema/normalization/whatever you call it?
In the current design, how would you query the silver and gold badges separately?
GROUP BY Id and picking the min/max or first/second by the Date somehow?
How can you write a query that lists all the silver badges first then all the gold badges next?
Imagine also that the "real" query may be more complicated, i.e. not just listing by date.
How would you write it so that it doesn't have too many repetition between the silver and gold subqueries?
Is it perhaps more typical to do two totally separate queries instead?
What is this idiom called? A row "partitioning" query to put them into "buckets" or something?