Why do dicts of defaultdict(int)'s use so much memory? (and other simple python performance question

Posted by dukhat on Stack Overflow See other posts from Stack Overflow or by dukhat
Published on 2010-04-30T20:33:57Z Indexed on 2010/04/30 20:37 UTC
Read the original article Hit count: 105

Filed under:
|
|
|
import numpy as num
from collections import defaultdict

topKeys = range(16384)
keys = range(8192)

table = dict((k,defaultdict(int)) for k in topKeys)

dat = num.zeros((16384,8192), dtype="int32")

print "looping begins"
#how much memory should this use? I think it shouldn't use more that a few
#times the memory required to hold (16384*8192) int32's (512 mb), but
#it uses 11 GB!
for k in topKeys:
    for j in keys:
        dat[k,j] = table[k][j]

print "done"

What is going on here? Furthermore, this similar script takes eons to run compared to the first one, and also uses an absurd quantity of memory.

topKeys = range(16384)
keys = range(8192)
table = [(j,0) for k in topKeys for j in keys]

I guess python ints might be 64 bit ints, which would account for some of this, but do these relatively natural and simple constructions really produce such a massive overhead? I guess these scripts show that they do, so my question is: what exactly is causing the high memory usage in the first script and the long runtime and high memory usage of the second script and is there any way to avoid these costs?

© Stack Overflow or respective owner

Related posts about python

Related posts about memory