- by Az
Background: I'm running multiple simuations on a set of data. For each session, I'm allocating projects to students. The difference between each session is that I'm randomising the order of the students such that all the students get a shot at being assigned a project they want. I was writing out some of the allocations in a spreadsheet (i.e. Excel) and it basically looked like this (tiny snapshot, actual table extends to a few thousand sessions, roughly 100 students).
| | Session 1 | Session 2 | Session 3 |
|----------|-----------|-----------|-----------|
|Stu1 |Proj_AA |Proj_AB |Proj_AB |
|----------|-----------|-----------|-----------|
|Stu2 |Proj_AB |Proj_AA |Proj_AC |
|----------|-----------|-----------|-----------|
|Stu3 |Proj_AC |Proj_AC |Proj_AA |
|----------|-----------|-----------|-----------|
Now, the code that deals with the allocation currently stores a session in an object. The next time the allocation is run, the object is over-written.
Thus what I'd really like to do is to store all the allocation results. This is important since I later need to derive from the data, information such as: which project Stu1 got assigned to the most or perhaps how popular Proj_AC was (how many times it was assigned / number of sessions).
Question(s):
What methods can I possibly use to basically store such session information persistently? Basically, each session output needs to add itself to the repository after ending and before beginning the next allocation cycle.
One solution that was suggested by a friend was mapping these results to a relational database using SQLAlchemy. I kind of like the idea since this does give me an opportunity to delve into databases.
Now the database structure I was recommended was:
|----------|-----------|-----------|
|Session |Student |Project |
|----------|-----------|-----------|
|1 |Stu1 |Proj_AA |
|----------|-----------|-----------|
|1 |Stu2 |Proj_AB |
|----------|-----------|-----------|
|1 |Stu3 |Proj_AC |
|----------|-----------|-----------|
|2 |Stu1 |Proj_AB |
|----------|-----------|-----------|
|2 |Stu2 |Proj_AA |
|----------|-----------|-----------|
|2 |Stu3 |Proj_AC |
|----------|-----------|-----------|
|3 |Stu1 |Proj_AB |
|----------|-----------|-----------|
|3 |Stu2 |Proj_AC |
|----------|-----------|-----------|
|3 |Stu3 |Proj_AA |
|----------|-----------|-----------|
Here, it was suggested that I make the Session and Student columns a composite key. That way I can access a specific record for a particular student for a particular session. Or I can merely get the entire allocation run for a particular session.
Questions:
Is the idea a good one?
How does one implement and query a composite key using SQLAlchemy?
What happens to the database if a particular student is not assigned a project (happens if all projects that he wants are taken)? In the code, if a student is not assigned a project, instead of a proj_id he simply gets None for that field/object.
I apologise for asking multiple questions but since these are closely-related, I thought I'd ask them in the same space.