Why is my django bulk database population so slow and frequently failing?
- by bryn
I decided I'd like to use django's model system rather than coding raw SQL to interface with my database, but I am having a problem that surely is avoidable.
My models.py contains:
class Student(models.Model):
student_id = models.IntegerField(unique = True)
form = models.CharField(max_length = 10)
preferred = models.CharField(max_length = 70)
surname = models.CharField(max_length = 70)
and I'm populating it by looping through a list as follows:
from models import Student
for id, frm, pref, sname in large_list_of_data:
s = Student(student_id = id, form = frm, preferred = pref, surname = sname)
s.save()
I don't really want to be saving this to the database each time but I don't know another way to get django to not forget about it (I'd rather add all the rows and then do a single commit).
There are two problems with the code as it stands.
It's slow -- about 20 students get updated each second.
It doesn't even make it through large_list_of_data, instead throwing a DatabaseError saying "unable to open database file". (Possibly because I'm using sqlite3.)
My question is: How can I stop these two things from happening? I'm guessing that the root of both problems is that I've got the s.save() but I don't see a way of easily batching the students up and then saving them in one commit to the database.