ndarray field names for both row and column?
- by Graham Mitchell
I'm a computer science teacher trying to create a little gradebook for myself using NumPy. But I think it would make my code easier to write if I could create an ndarray that uses field names for both the rows and columns. Here's what I've got so far:
import numpy as np
num_stud = 23
num_assign = 2
grades = np.zeros(num_stud, dtype=[('assign 1','i2'), ('assign 2','i2')]) #etc
gv = grades.view(dtype='i2').reshape(num_stud,num_assign)
So, if my first student gets a 97 on 'assign 1', I can write either of:
grades[0]['assign 1'] = 97
gv[0][0] = 97
Also, I can do the following:
np.mean( grades['assign 1'] ) # class average for assignment 1
np.sum( gv[0] ) # total points for student 1
This all works. But what I can't figure out how to do is use a student id number to refer to a particular student (assume that two of my students have student ids as shown):
grades['123456']['assign 2'] = 95
grades['314159']['assign 2'] = 83
...or maybe create a second view with the different field names?
np.sum( gview2['314159'] ) # total points for the student with the given id
I know that I could create a dict mapping student ids to indices, but that seems fragile and crufty, and I'm hoping there's a better way than:
id2i = { '123456': 0, '314159': 1 }
np.sum( gv[ id2i['314159'] ] )
I'm also willing to re-architect things if there's a cleaner design. I'm new to NumPy, and I haven't written much code yet, so starting over isn't out of the question if I'm Doing It Wrong.
I am going to be needing to sum all the assignment points for over a hundred students once a day, as well as run standard deviations and other stats. Plus, I'll be waiting on the results, so I'd like it to run in only a couple of seconds.
Thanks in advance for any suggestions.