System architecture: simple approach for setting up background tasks behind a web application -- wil
- by Tim Molendijk
I have a Django web application and I have some tasks that should operate (or actually: be initiated) on the background.
The application is deployed as follows:
apache2-mpm-worker;
mod_wsgi in daemon mode (1 process, 15 threads).
The background tasks have the following characteristics:
they need to operate in a regular interval (every 5 minutes or so);
they require the application context (i.e. the application packages need to be available in memory);
they do not need any input other than database access, in order to perform some not-so-heavy tasks such as sending out e-mail and updating the state of the database.
Now I was thinking that the most simple approach to this problem would be simply to piggyback on the existing application process (as spawned by mod_wsgi). By implementing the task as part of the application and providing an HTTP interface for it, I would prevent the overhead of another process that is holding all of the application into memory. A simple cronjob can be setup that sends a request to this HTTP interface every 5 minutes and that would be it. Since the application process provides 15 threads and the tasks are quite lightweight and only running every 5 minutes, I figure they would not be hindering the performance of the web application's user-facing operations.
Yet... I have done some online research and I have seen nobody advocating this approach. Many articles suggest a significantly more complex approach based on a full-blown messaging component (such as Celery, which uses RabbitMQ). Although that's sexy, it sounds like overkill to me. Some articles suggest setting up a cronjob that executes a script which performs the tasks. But that doesn't feel very attractive either, as it results in creating a new process that loads the entire application into memory, performs some tiny task, and destroys the process again. And this is repeated every 5 minutes. Does not sound like an elegant solution.
So, I'm looking for some feedback on my suggested approach as described in the paragraph before the preceeding paragraph. Is my reasoning correct? Am I overlooking (potential) problems? What about my assumption that application's performance will not be impeded?