How to analyze a scenario where a bug didn't get caught and adjust development workflow to prevent similar errors
- by durron597
I had a bug that was really difficult to track down, because all the unit tests were green, but the production application didn't work properly.
Here's what happened:
I had a filter class that set my application to ignore data that was not in some specified time windows.
The unit test, which seemed thorough to me, turned green.
Additionally, my integration tests also produced results as expected.
Production, however, did not work.
As a result of the first two bullets, this problem was very difficult to find.
It turned out the problem was that my test dates were using my time zone (America/Chicago) but the production data was providing dates in UTC, which I did not realize, and the logic for the filter wasn't correct for UTC dates. (I was using joda time DateTime objects).
Where did my workflow break down?
Did I fail to produce a spec that specified that the logic needed to handle dates in any time zone?
Did I fail to thoroughly consider all cases at the unit test level?
Did I fail to insure the integration test was sufficiently similar to production?
Other?
What changes can I make to my workflow to better prevent this sort of mistake in the future?
How can I more effectively debug a problem when there is an issue in production but not in testing?