I've got a function which does a parse of a sentence by building up a big chart. For some reason, Python holds on to whatever memory was allocated during that function call. That is, I do
best = translate(sentence, grammar)
and somehow my memory goes up and stays up.
Here is the function:
from string import join
from heapq import nsmallest, heappush
def translate(f, g):
words = f.split()
chart = {}
for col in range(len(words)):
for row in reversed(range(0,col+1)):
# get rules for this subspan
rules = g[join(words[row:col+1], ' ')]
# ensure there's at least one rule on the diagonal
if not rules and row==col:
rules=[(0.0, join(words[row:col+1]))]
# pick up rules below & to the left
for k in range(row,col):
if (row,k) and (k+1,col) in chart:
for (w1, e1) in chart[row, k]:
for (w2, e2) in chart[k+1,col]:
heappush(rules, (w1+w2, e1+' '+e2))
# add all rules to chart
chart[row,col] = nsmallest(MAX_TRANSLATIONS, rules)
(w, best) = chart[0, len(words)-1][0]
return best
EDIT: Using Python 2.7 on OS X. The grammar g is just a dictionary from strings to heaps, e.g.:
g['et']
[(1.05, 'and'), (6.92, ', and'), (9.95, 'and ,'), (11.17, 'and to')]
EDIT: If you want to run the code, try the sentence "Cela est difficile" with the following grammar:
>>> g['cela']
[(8.28, 'this'), (11.21, 'it'), (11.57, 'that'), (15.26, 'this ,')]
>>> g['est']
[(2.69, 'is'), (10.21, 'is ,'), (11.15, 'has'), (11.28, ', is')]
>>> g['difficile']
[(2.01, 'difficult'), (10.08, 'hard'), (10.19, 'difficult ,'), (10.57, 'a difficult')]