How do I find the top N batters per year?
- by Drew Stephens
I'm playing around with the Lahman Baseball Database in a MySQL instance. I want to find the players who topped home runs (HR) for each year. The Batting table has the following (relevant parts) of its schema:
+-----------+----------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+----------------------+------+-----+---------+-------+
| playerID | varchar(9) | NO | PRI | | |
| yearID | smallint(4) unsigned | NO | PRI | 0 | |
| HR | smallint(3) unsigned | YES | | NULL | |
+-----------+----------------------+------+-----+---------+-------+
For each year, every player has an entry (between hundreds and 12k per year, going back to 1871). Getting the top N hitters for a single year is easy:
SELECT playerID,yearID,HR
FROM Batting
WHERE yearID=2009
ORDER BY HR DESC LIMIT 3;
+-----------+--------+------+
| playerID | yearID | HR |
+-----------+--------+------+
| pujolal01 | 2009 | 47 |
| fieldpr01 | 2009 | 46 |
| howarry01 | 2009 | 45 |
+-----------+--------+------+
But I'm interested in finding the top 3 from every year. I've found solutions like this, describing how to select the top from a category and I've tried to apply it to my problem, only to end up with a query that never returns:
SELECT
b.yearID, b.playerID, b.HR
FROM
Batting AS b
LEFT JOIN
Batting b2
ON
(b.yearID=b2.yearID AND b.HR <= b2.HR)
GROUP BY b.yearID HAVING COUNT(*) <= 3;
Where have I gone wrong?