heya,
I have a Excel CSV files with employee records in them. Something like this:
mail,first_name,surname,employee_id,manager_id,telephone_number
[email protected],john,smith,503422,503423,+65(2)3423-2433
[email protected],george,brown,503097,503098,+65(2)3423-9782
....
I'm using DictReader
to put this into a nested dictionary:
import csv
gd_extract = csv.DictReader(open('filename 20100331 original.csv'), dialect='excel')
employees = dict([(row['employee_id'], row) for row in gp_extract])
Is the above the proper way
to do it - it does work, but is it the Right Way? Something more efficient? Also, the funny thing is, in IDLE, if I try
to print out "employees" at the shell, it seems
to cause IDLE
to crash (there's approximately 1051 rows).
2. Remove employee_id from inner dict
The second issue issue, I'm putting it into a
dictionary indexed by employee_id, with the value as a nested
dictionary of all the values - however, employee_id is also a key:value inside the nested
dictionary, which is a bit redundant? Is there any way
to exclude it from the inner dictionary?
3. Manipulate data in comprehension
Thirdly, we need do some manipulations
to the imported data - for example, all the phone numbers are in the wrong format, so we need
to do some regex there. Also, we need
to convert manager_id
to an actual manager's name, and their email address. Most managers are in the same file, while others are in an external_contractors CSV, which is similar but not quite the same format - I can import that
to a separate dict though.
Are these two items things that can be done within the single list comprehension, or should I use a for loop? Or does multiple comprehensions work? (sample code would be really awesome here). Or is there a smarter way in Python do it?
Cheers,
Victor