deepcopy and python - tips to avoid using it?
Posted
by blackkettle
on Stack Overflow
See other posts from Stack Overflow
or by blackkettle
Published on 2010-06-15T07:49:14Z
Indexed on
2010/06/15
7:52 UTC
Read the original article
Hit count: 358
Hi,
I have a very simple python routine that involves cycling through a list of roughly 20,000 latitude,longitude coordinates and calculating the distance of each point to a reference point.
def compute_nearest_points( lat, lon, nPoints=5 ):
"""Find the nearest N points, given the input coordinates."""
points = session.query(PointIndex).all()
oldNearest = []
newNearest = []
for n in xrange(nPoints):
oldNearest.append(PointDistance(None,None,None,99999.0,99999.0))
newNearest.append(obj2)
#This is almost certainly an inappropriate use of deepcopy
# but how SHOULD I be doing this?!?!
for point in points:
distance = compute_spherical_law_of_cosines( lat, lon, point.avg_lat, point.avg_lon )
k = 0
for p in oldNearest:
if distance < p.distance:
newNearest[k] = PointDistance(
point.point, point.kana, point.english, point.avg_lat, point.avg_lon, distance=distance
)
break
else:
newNearest[k] = deepcopy(oldNearest[k])
k += 1
for j in range(k,nPoints-1):
newNearest[j+1] = deepcopy(oldNearest[j])
oldNearest = deepcopy(newNearest)
#We're done, now print the result
for point in oldNearest:
print point.station, point.english, point.distance
return
I initially wrote this in C, using the exact same approach, and it works fine there, and is basically instantaneous for nPoints<=100. So I decided to port it to python because I wanted to use SqlAlchemy to do some other stuff.
I first ported it without the deepcopy statements that now pepper the method, and this caused the results to be 'odd', or partially incorrect, because some of the points were just getting copied as references(I guess? I think?) -- but it was still pretty nearly as fast as the C version.
Now with the deepcopy calls added, the routine does it's job correctly, but it has incurred an extreme performance penalty, and now takes several seconds to do the same job.
This seems like a pretty common job, but I'm clearly not doing it the pythonic way. How should I be doing this so that I still get the correct results but don't have to include deepcopy everywhere?
© Stack Overflow or respective owner