Thread.Interrupt Is Evil
- by Alois Kraus
Recently I have found an interesting issue with Thread.Interrupt during application shutdown. Some application was crashing once a week and we had not really a clue what was the issue. Since it happened not very often it was left as is until we have got some memory dumps during the crash. A memory dump usually means WindDbg which I really like to use (I know I am one of the very few fans of it). After a quick analysis I did find that the main thread already had exited and the thread with the crash was stuck in a Monitor.Wait. Strange Indeed. Running the application a few thousand times under the debugger would potentially not have shown me what the reason was so I decided to what I call constructive debugging. I did create a simple Console application project and try to simulate the exact circumstances when the crash did happen from the information I have via memory dump and source code reading. The thread that was crashing was actually MS code from an old version of the Microsoft Caching Application Block. From reading the code I could conclude that the main thread did call the Dispose method on the CacheManger class which did call Thread.Interrupt on the cache scavenger thread which was just waiting for work to do. My first version of the repro looked like this static void Main(string[] args)
{
Thread t = new Thread(ThreadFunc)
{
IsBackground = true,
Name = "Test Thread"
};
t.Start();
Console.WriteLine("Interrupt Thread");
t.Interrupt();
}
static void ThreadFunc()
{
while (true)
{
object value = Dequeue(); // block until unblocked or awaken via ThreadInterruptedException
}
}
static object WaitObject = new object();
static object Dequeue()
{
object lret = "got value";
try
{
lock (WaitObject)
{
}
}
catch (ThreadInterruptedException)
{
Console.WriteLine("Got ThreadInterruptException");
lret = null;
}
return lret;
}
I do start a background thread and call Thread.Interrupt on it and then directly let the application terminate. The thread in the meantime does plenty of Monitor.Enter/Leave calls to simulate work on it. This first version did not crash. So I need to dig deeper. From the memory dump I did know that the finalizer thread was doing just some critical finalizers which were closing file handles. Ok lets add some long running finalizers to the sample.
class FinalizableObject : CriticalFinalizerObject
{
~FinalizableObject()
{
Console.WriteLine("Hi we are waiting to finalize now and block the finalizer thread for 5s.");
Thread.Sleep(5000);
}
}
class Program
{
static void Main(string[] args)
{
FinalizableObject fin = new FinalizableObject();
Thread t = new Thread(ThreadFunc)
{
IsBackground = true,
Name = "Test Thread"
};
t.Start();
Console.WriteLine("Interrupt Thread");
t.Interrupt();
GC.KeepAlive(fin); // prevent finalizing it too early
// After leaving main the other thread is woken up via Thread.Abort
// while we are finalizing. This causes a stackoverflow in the CLR ThreadAbortException handling at this time.
}
With this changed Main method and a blocking critical finalizer I did get my crash just like the real application. The funny thing is that this is actually a CLR bug. When the main method is left the CLR does suspend all threads except the finalizer thread and declares all objects as garbage. After the normal finalizers were called the critical finalizers are executed to e.g. free OS handles (usually). Remember that I did call Thread.Interrupt as one of the last methods in the Main method. The Interrupt method is actually asynchronous and does wake a thread up and throws a ThreadInterruptedException only once unlike Thread.Abort which does rethrow the exception when an exception handling clause is left.
It seems that the CLR does not expect that a frozen thread does wake up again while the critical finalizers are executed. While trying to raise a ThreadInterrupedException the CLR goes down with an stack overflow. Ups not so nice. Why has this nobody noticed for years is my next question. As it turned out this error does only happen on the CLR for .NET 4.0 (x86 and x64). It does not show up in earlier or later versions of the CLR.
I have reported this issue on connect here but so far it was not confirmed as a CLR bug. But I would be surprised if my console application was to blame for a stack overflow in my test thread in a Monitor.Wait call.
What is the moral of this story? Thread.Abort is evil but Thread.Interrupt is too. It is so evil that even the CLR of .NET 4.0 contains a race condition during the CLR shutdown. When the CLR gurus can get it wrong the chances are high that you get it wrong too when you use this constructs. If you do not believe me see what Patrick Smacchia does blog about Thread.Abort and List.Sort. Not only the CLR creators can get it wrong. The BCL writers do sometimes have a hard time with correct exception handling as well. If you do tell me that you use Thread.Abort frequently and never had problems with it I do suspect that you do not have looked deep enough into your application to find such sporadic errors.