mongodb: insert if not exists

Posted by LeMiz on Stack Overflow See other posts from Stack Overflow or by LeMiz
Published on 2010-05-10T07:33:32Z Indexed on 2010/05/27 18:21 UTC
Read the original article Hit count: 713

Filed under:
|
|

Hello,

Every day, I receive a stock of documents (an update). What I want to do is inserting each of them if it does not exists.

  • I also want to keep track of the first time I inserted them, and the last time I saw them in an update.
  • I don't want to have duplicate documents.
  • I don't want to remove a document which has previously been saved, but is not in my update.
  • 95% (estimated) of the records are unmodified from day to day.

I am using the python driver (pymongo), for that matter.

What I currently do is (pseudo-code):

for each document in update:
      existing_document = collection.find_one(document)
      if not existing_document:
           document['insertion_date'] = now
      else:
           document = existing_document
      document['last_update_date'] = now
      my_collection.save(document)

My problem is that it is very slow (40 mins for less than 100 000 records, and I have millions of them in the update). I am pretty sure there is something builtin for doing this, but the document for update() is mmmhhh.... a bit terse.... ( http://www.mongodb.org/display/DOCS/Updating )

Can someone give an advice on doing it faster ?

© Stack Overflow or respective owner

Related posts about python

Related posts about mongodb