Why do dicts of defaultdict(int)'s use so much memory? (and other simple python performance question
Posted
by dukhat
on Stack Overflow
See other posts from Stack Overflow
or by dukhat
Published on 2010-04-30T20:33:57Z
Indexed on
2010/04/30
20:37 UTC
Read the original article
Hit count: 105
import numpy as num
from collections import defaultdict
topKeys = range(16384)
keys = range(8192)
table = dict((k,defaultdict(int)) for k in topKeys)
dat = num.zeros((16384,8192), dtype="int32")
print "looping begins"
#how much memory should this use? I think it shouldn't use more that a few
#times the memory required to hold (16384*8192) int32's (512 mb), but
#it uses 11 GB!
for k in topKeys:
for j in keys:
dat[k,j] = table[k][j]
print "done"
What is going on here? Furthermore, this similar script takes eons to run compared to the first one, and also uses an absurd quantity of memory.
topKeys = range(16384)
keys = range(8192)
table = [(j,0) for k in topKeys for j in keys]
I guess python ints might be 64 bit ints, which would account for some of this, but do these relatively natural and simple constructions really produce such a massive overhead? I guess these scripts show that they do, so my question is: what exactly is causing the high memory usage in the first script and the long runtime and high memory usage of the second script and is there any way to avoid these costs?
© Stack Overflow or respective owner