A rather confusing sequence of events happened, according to my log-file, and I am about to put a lot of the blame on the Python logger, which is a bold claim. I thought I should get some second opinions about whether what I am saying could be true.
I am trying to explain why there is are several large gaps in my log file (around two minutes at a time) during stressful periods for my application when it is missing deadlines.
I am using Python's logging module on a remote server, and have set-up, with a configuration file, for all logs of severity of ERROR or higher to be emailed to me. Typically, only one error will be sent at a time, but during periods of sustained problems, I might get a dozen in a minute - annoying, but nothing that should stress SMTP.
I believe that, after a short spurt of such messages, the Python logging system (or perhaps the SMTP system it is sitting on) is encountering errors or congestion. The call to Python's log is then BLOCKING for two minutes, causing my thread to miss its deadlines. (I was smart enough to move the logging until after the critical path of the application - so I don't care if logging takes me a few seconds, but two minutes is far too long.)
This seems like a rather awkward architecture (for both a logging system that can freeze up, and for an SMTP system (Ubuntu, sendmail) that cannot handle dozens of emails in a minute**), so this surprises me, but it exactly fits the symptoms.
Has anyone had any experience with this? Can anyone describe how to stop it from blocking?
** EDIT: I actually counted. A little under 4000 short emails in two hours. So far more than I suggested, sorry. But enough to over-fill a Sendmail's buffers?