deepcopy and python - tips to avoid using it?

Posted by blackkettle on Stack Overflow See other posts from Stack Overflow or by blackkettle
Published on 2010-06-15T07:49:14Z Indexed on 2010/06/15 7:52 UTC
Read the original article Hit count: 358

Filed under:
|

Hi,

I have a very simple python routine that involves cycling through a list of roughly 20,000 latitude,longitude coordinates and calculating the distance of each point to a reference point.

def compute_nearest_points( lat, lon, nPoints=5 ):
    """Find the nearest N points, given the input coordinates."""

    points = session.query(PointIndex).all()
    oldNearest = []
    newNearest = []
    for n in xrange(nPoints):
        oldNearest.append(PointDistance(None,None,None,99999.0,99999.0))
        newNearest.append(obj2)

    #This is almost certainly an inappropriate use of deepcopy
    #  but how SHOULD I be doing this?!?!
    for point in points:
        distance = compute_spherical_law_of_cosines( lat, lon, point.avg_lat, point.avg_lon )
        k = 0
        for p in oldNearest:
            if distance < p.distance:
                newNearest[k] = PointDistance(
                    point.point, point.kana, point.english, point.avg_lat, point.avg_lon, distance=distance
                    )
                break
            else:
                newNearest[k] = deepcopy(oldNearest[k])
            k += 1
        for j in range(k,nPoints-1):
            newNearest[j+1] = deepcopy(oldNearest[j])
        oldNearest = deepcopy(newNearest)

    #We're done, now print the result
    for point in oldNearest:
        print point.station, point.english, point.distance

    return

I initially wrote this in C, using the exact same approach, and it works fine there, and is basically instantaneous for nPoints<=100. So I decided to port it to python because I wanted to use SqlAlchemy to do some other stuff.

I first ported it without the deepcopy statements that now pepper the method, and this caused the results to be 'odd', or partially incorrect, because some of the points were just getting copied as references(I guess? I think?) -- but it was still pretty nearly as fast as the C version.

Now with the deepcopy calls added, the routine does it's job correctly, but it has incurred an extreme performance penalty, and now takes several seconds to do the same job.

This seems like a pretty common job, but I'm clearly not doing it the pythonic way. How should I be doing this so that I still get the correct results but don't have to include deepcopy everywhere?

© Stack Overflow or respective owner

Related posts about python

Related posts about deepcopy