What is the procedure for debugging a production-only error?

Posted by Lord Torgamus on Stack Overflow See other posts from Stack Overflow or by Lord Torgamus
Published on 2010-06-10T14:39:12Z Indexed on 2010/06/10 14:42 UTC
Read the original article Hit count: 247

Let me say upfront that I'm so ignorant on this topic that I don't even know whether this question has objective answers or not. If it ends up being "not," I'll delete or vote to close the post.

Here's the scenario: I just wrote a little web service. It works on my machine. It works on my team lead's machine. It works, as far as I can tell, on every machine except for the production server. The exception that the production server spits out upon failure originates from a third-party JAR file, and is skimpy on information. I search the web for hours, but don't come up with anything useful.

So what's the procedure for tracking down an issue that occurs only on production machines? Is there a standard methodology, or perhaps category/family of tools, for this?

The error that inspired this question has already been fixed, but that was due more to good fortune than a solid approach to debugging. I'm asking this question for future reference.

Some related questions:

Test accounts and products in a production system
Running test on Production Code/Server

© Stack Overflow or respective owner

Related posts about best-practices

Related posts about debugging