Is it reasonable to insist on reproducing every defect before diagnosing and fixing it?
- by amphibient
I work for a software product company. We have large enterprise customers who implement our product and we provide support to them. For example, if there is a defect, we provide patches, etc. In other words, It is a fairly typical setup.
Recently, a ticket was issued and assigned to me regarding an exception that a customer found in a log file and that has to do with concurrent database access in a clustered implementation of our product. So the specific configuration of this customer may well be critical in the occurrence of this bug. All we got from the customer was their log file.
The approach I proposed to my team was to attempt to reproduce the bug in a similar configuration setup as that of the customer and get a comparable log. However, they disagree with my approach saying that I should not need to reproduce the bug (as that is overly time-consuming and will require simulating a server cluster on VMs) and that I should simply "follow the code" to see where the thread- and/or transaction-unsafe code is and put the change working off of a simple local development, which is not a cluster implementation like the environment from which the occurrence of the bug originates.
To me, working out of an abstract blueprint (program code) rather than a concrete, tangible, visible manifestation (runtime reproduction) seems like a difficult working environment (for a person of normal cognitive abilities and attention span), so I wanted to ask a general question:
Is it reasonable to insist on reproducing every defect and debug
it before diagnosing and fixing it?
Or:
If I am a senior developer, should I be able to read (multithreaded) code and
create a mental picture of what it does in all use case scenarios
rather than require to run the application, test different use case
scenarios hands on, and step through the code line by line? Or am I a poor
developer for demanding that kind of work environment? Is debugging for sissies?
In my opinion, any fix submitted in response to an incident ticket should be tested in an environment simulated to be as close to the original environment as possible. How else can you know that it will really remedy the issue? It is like releasing a new model of a vehicle without crash testing it with a dummy to demonstrate that the air bags indeed work.
Last but not least, if you agree with me:
How should I talk with my team to convince them that my approach is reasonable, conservative and more bulletproof?