Need help optimizing this Django aggregate query
- by Chris Lawlor
I have the following model
class Plugin(models.Model):
name = models.CharField(max_length=50)
# more fields
which represents a plugin that can be downloaded from my site. To track downloads, I have
class Download(models.Model):
plugin = models.ForiegnKey(Plugin)
timestamp = models.DateTimeField(auto_now=True)
So to build a view showing plugins sorted by downloads, I have the following query:
# pbd is plugins by download - commented here to prevent scrolling
pbd = Plugin.objects.annotate(dl_total=Count('download')).order_by('-dl_total')
Which works, but is very slow. With only 1,000 plugins, the avg. response is 3.6 - 3.9 seconds (devserver with local PostgreSQL db), where a similar view with a much simpler query (sorting by plugin release date) takes 160 ms or so.
I'm looking for suggestions on how to optimize this query. I'd really prefer that the query return Plugin objects (as opposed to using values) since I'm sharing the same template for the other views (Plugins by rating, Plugins by release date, etc.), so the template is expecting Plugin objects - plus I'm not sure how I would get things like the absolute_url without a reference to the plugin object.
Or, is my whole approach doomed to failure? Is there a better way to track downloads? I ultimately want to provide users some nice download statistics for the plugins they've uploaded - like downloads per day/week/month. Will I have to calculate and cache Downloads at some point?
EDIT: In my test dataset, there are somewhere between 10-20 Download instances per Plugin - in production I expect this number would be much higher for many of the plugins.