In a Python program I'm writing I've compared using a for loop and increment variables versus list comprehension with map(itemgetter) and len() when counting entries in dictionaries which are in a list. It takes the same time using a each method. Am I doing something wrong or is there a better approach?
Here is a greatly simplified and shortened data structure:
list = [
{'key1': True, 'dontcare': False, 'ignoreme': False, 'key2': True, 'filenotfound': 'biscuits and gravy'},
{'key1': False, 'dontcare': False, 'ignoreme': False, 'key2': True, 'filenotfound': 'peaches and cream'},
{'key1': True, 'dontcare': False, 'ignoreme': False, 'key2': False, 'filenotfound': 'Abbott and Costello'},
{'key1': False, 'dontcare': False, 'ignoreme': True, 'key2': False, 'filenotfound': 'over and under'},
{'key1': True, 'dontcare': True, 'ignoreme': False, 'key2': True, 'filenotfound': 'Scotch and... well... neat, thanks'}
]
Here is the for loop version:
#!/usr/bin/env python
# Python 2.6
# count the entries where key1 is True
# keep a separate count for the subset that also have key2 True
key1 = key2 = 0
for dictionary in list:
if dictionary["key1"]:
key1 += 1
if dictionary["key2"]:
key2 += 1
print "Counts: key1: " + str(key1) + ", subset key2: " + str(key2)
Output for the data above:
Counts: key1: 3, subset key2: 2
Here is the other, perhaps more Pythonic, version:
#!/usr/bin/env python
# Python 2.6
# count the entries where key1 is True
# keep a separate count for the subset that also have key2 True
from operator import itemgetter
KEY1 = 0
KEY2 = 1
getentries = itemgetter("key1", "key2")
entries = map(getentries, list)
key1 = len([x for x in entries if x[KEY1]])
key2 = len([x for x in entries if x[KEY1] and x[KEY2]])
print "Counts: key1: " + str(key1) + ", subset key2: " + str(key2)
Output for the data above (same as before):
Counts: key1: 3, subset key2: 2
I'm a tiny bit surprised these take the same amount of time. I wonder if there's something faster. I'm sure I'm overlooking something simple.
One alternative I've considered is loading the data into a database and doing SQL queries, but the data doesn't need to persist and I'd have to profile the overhead of the data transfer, etc., and a database may not always be available.
I have no control over the original form of the data.
The code above is not going for style points.