I've been using FogBugz's Evidence Based Scheduling (for the uninitiated, Joel explains) for a while now and there's an inherent problem I can't seem to work around.
The system is good at telling me the probability that a given project will be delivered at some date, given the detailed list of tasks that comprise the project. However, it does not take into account the fact that during development additional tasks always pop up.
Now, there's the garbage-can approach of creating a generic task/scheduled-item for "last minute hacks" or "integration tasks", or what have you, but that clearly goes against the idea of aggregating the estimates of many small cases.
It's often the case that during the development stage of a project you realize that there's a whole area your planning didn't cover, because, well, that's the nature of developing stuff that hasn't been developed before. So now your ~3 month project may very well turn into a 6 month project, but not because your estimations were off (you could be the best estimator in the world, for those task the comprised your initial work plan); rather because you ended up adding a whole bunch of new tasks that weren't there to begin with.
EBS doesn't help you with that. It could, theoretically (I guess). It could, perhaps, measure the amount of work you add to a project over time and take that into consideration when estimating the time remaining on a given project. Just a thought.
In other words, EBS works on a task basis, but not on a project/release basis - but the latter is what's important. It's what your boss typically cares about - delivery date, not the time it takes to finish each task along the way, and not the time it would have taken, if your planning was perfect.
So the question is (yes, there's a question here, don't close it):
What's your methodology when it comes to using EBS in FogBugz and how do you solve the problem above, which seems to be a main cause of schedule delays and mispredictions?
Edit
Some more thoughts after reading a few answers:
If it comes down to having to choose which delivery date you're comfortable presenting to your higher-ups by squinting at the delivery-probability graph and choosing 80%, or 95%, or 60% (based on what, exactly?) then we've resorted to plain old buffering/factoring of our estimates. In which case, couldn't we have skipped the meticulous case by case hour-sized estimation effort step? By forcing ourselves to break down tasks that take more than a day into smaller chunks of work haven't we just deluded ourselves into thinking our planning is as tight and thorough as it could be?
People may be consistently bad estimators that do not even learn from their past mistakes. In that respect, having an EBS system is certainly better than not having one. But what can we do about the fact that we're not that good in planning as well? I'm not sure it's a problem that can be solved by a similar system. Our estimates are wrong because of tendencies to be overly optimistic/pessimistic about certain tasks, and because of neglect to account for systematic delays (e.g. sick days, major bug crisis) - and usually not because we lack knowledge about the work that needs to be done. Our planning, on the other hand, is often incomplete because we simply don't have enough knowledge in this early stage; and I don't see how an EBS-like system could fill that gap. So we're back to methodology. We need to find a way to accommodate bad or incomplete work plans that's better than voodoo-multiplication.