Python json memory bloat
- by Anoop
import json
import time
from itertools import count
def keygen(size):
for i in count(1):
s = str(i)
yield '0' * (size - len(s)) + str(s)
def jsontest(num):
keys = keygen(20)
kvjson = json.dumps(dict((keys.next(), '0' * 200) for i in range(num)))
kvpairs = json.loads(kvjson)
del kvpairs # Not required. Just to check if it makes any difference
print 'load completed'
jsontest(500000)
while 1:
time.sleep(1)
Linux top indicates that the python process holds ~450Mb of RAM after completion of 'jsontest' function. If the call to 'json.loads' is omitted then this issue is not observed. A gc.collect after this function execution does releases the memory.
Looks like the memory is not held in any caches or python's internal memory allocator as explicit call to gc.collect is releasing memory.
Is this happening because the threshold for garbage collection (700, 10, 10) was never reached ?
I did put some code after jsontest to simulate threshold. But it didn't help.