How to schedule hundreds of thousands of tasks?

Posted by wehriam on Stack Overflow See other posts from Stack Overflow or by wehriam
Published on 2010-03-16T21:28:27Z Indexed on 2010/03/16 21:31 UTC
Read the original article Hit count: 144

Filed under:

We have hundreds of thousands of tasks that need to be run at a variety of arbitrary intervals, some every hour, some every day, and so on. The tasks are resource intensive and need to be distributed across many machines.

Right now tasks are stored in a database with an "execute at this time" timestamp. To find tasks that need to be executed, we query the database for jobs that are due to be executed, then update the timestamps when the task is complete. Naturally this leads to a substantial write load on the database.

As far as I can tell, we are looking for something to release tasks into a queue at a set interval. (Workers could then request tasks from that queue.)

What is the best way to schedule recurring tasks at scale?

For what it's worth we're largely using Python, although we have no problems using components (RabbitMQ?) written in other languages.

© Stack Overflow or respective owner

Related posts about python