How do I debug a difficult-to-reproduce crash with no useful call stack?

Posted by David M on Stack Overflow See other posts from Stack Overflow or by David M
Published on 2011-01-17T23:48:37Z Indexed on 2011/01/17 23:53 UTC
Read the original article Hit count: 277

Filed under:
|
|
|

I am encountering an odd crash in our software and I'm having a lot of trouble debugging it, and so I am seeking SO's advice on how to tackle it.

The crash is an access violation reading a NULL pointer:

First chance exception at $00CF0041. Exception class $C0000005 with message 'access violation at 0x00cf0041: read of address 0x00000000'.

It only happens 'sometimes' - I haven't managed to figure out any rhyme or reason, yet, for when - and only in the main thread. When it occurs, the call stack contains one incorrect entry:

Call stack with one line, Classes::TList::Get, address 0x00cf0041

For the main thread, which this is, it should show a large stack full of other items.

At this point, all other threads are inactive (mostly sitting in WaitForSingleObject or a similar function.) I have only seen this crash occur in the main thread. It always has the same call stack of one entry, in the same method at the same address. This method may or may not be related - we do use the VCL in our application. My bet, though, is that something (possibly quite a while ago) is corrupting the stack, and the address where it's crashing is effectively random. Note it has been the same address across several builds, though - it's probably not truly random.

Here is what I've tried:

  • Trying to reproduce it reliably at a certain point. I have found nothing that reproduces it every time, and a couple of things that occasionally do, or do not, for no apparent reason. These are not 'narrow' enough actions to narrow it down to a particular section of code. It may be timing related, but at the point the IDE breaks in, other threads are usually doing nothing. I can't rule out a threading problem, but think it's unlikely.
  • Building with extra debugging statements (extra debug info, extra asserts, etc.) After doing so, the crash never occurs.
  • Building with Codeguard enabled. After doing so, the crash never occurs and Codeguard shows no errors.

My questions:

1. How do I find what code caused the crash? How do I do the equivalent of walking back up the stack?

2. What general advice do you have for how to trace the cause of this crash?

I am using Embarcadero RAD Studio 2010 (the project mostly contains C++ Builder code and small amounts of Delphi.)

© Stack Overflow or respective owner

Related posts about delphi

Related posts about crash