Is a many-to-many relationship with extra fields the right tool for my job?
- by whichhand
Previously had a go at asking a more specific version of this question, but had trouble articulating what my question was. On reflection that made me doubt if my chosen solution was correct for the problem, so this time I will explain the problem and ask if a) I am on the right track and b) if there is a way around my current brick wall.
I am currently building a web interface to enable an existing database to be interrogated by (a small number of) users. Sticking with the analogy from the docs, I have models that look something like this:
class Musician(models.Model):
first_name = models.CharField(max_length=50)
last_name = models.CharField(max_length=50)
dob = models.DateField()
class Album(models.Model):
artist = models.ForeignKey(Musician)
name = models.CharField(max_length=100)
class Instrument(models.Model):
artist = models.ForeignKey(Musician)
name = models.CharField(max_length=100)
Where I have one central table (Musician) and several tables of associated data that are related by either ForeignKey or OneToOneFields. Users interact with the database by creating filtering criteria to select a subset of Musicians based on data the data on the main or related tables. Likewise, the users can then select what piece of data is used to rank results that are presented to them. The results are then viewed initially as a 2 dimensional table with a single row per Musician with selected data fields (or aggregates) in each column.
To give you some idea of scale, the database has ~5,000 Musicians with around 20 fields of related data.
Up to here is fine and I have a working implementation. However, it is important that I have the ability for a given user to upload there own annotation data sets (more than one) and then filter and order on these in the same way they can with the existing data.
The way I had tried to do this was to add the models:
class UserDataSets(models.Model):
user = models.ForeignKey(User)
name = models.CharField(max_length=100)
description = models.CharField(max_length=64)
results = models.ManyToManyField(Musician, through='UserData')
class UserData(models.Model):
artist = models.ForeignKey(Musician)
dataset = models.ForeignKey(UserDataSets)
score = models.IntegerField()
class Meta:
unique_together = (("artist", "dataset"),)
I have a simple upload mechanism enabling users to upload a data set file that consists of 1 to 1 relationship between a Musician and their "score". Within a given user dataset each artist will be unique, but different datasets are independent from each other and will often contain entries for the same musician.
This worked fine for displaying the data, starting from a given artist I can do something like this:
artist = Musician.objects.get(pk=1)
dataset = UserDataSets.objects.get(pk=5)
print artist.userdata_set.get(dataset=dataset.pk)
However, this approach fell over when I came to implement the filtering and ordering of query set of musicians based on the data contained in a single user data set. For example, I could easily order the query set based on all of the data in the UserData table like this:
artists = Musician.objects.all().order_by(userdata__score)
But that does not help me order by the results of a given single user dataset. Likewise I need to be able to filter the query set based on the "scores" from different user data sets (eg find all musicians with a score 5 in dataset1 and < 2 in dataset2).
Is there a way of doing this, or am I going about the whole thing wrong?