What Counts for A DBA: Observant
- by drsql
When walking up to the building where I work, I can see CCTV cameras placed here and there for monitoring access to the building. We are required to wear authorization badges which could be checked at any time. Do we have enemies? Of course! No one is 100% safe; even if your life is a fairy tale, there is always a witch with an apple waiting to snack you into a thousand years of slumber (or at least so I recollect from elementary school.) Even Little Bo Peep had to keep a wary lookout.
We nerdy types (or maybe it was just me?) generally learned on the school playground to keep an eye open for unprovoked attack from simpler, but more muscular souls, and take steps to avoid messy confrontations well in advance. After we’d apprehensively negotiated adulthood with varying degrees of success, these skills of watching for danger, and avoiding it, translated quite well to the technical careers so many of us were destined for. And nowhere else is this talent for watching out for irrational malevolence so appropriate as in a career as a production DBA.
It isn’t always active malevolence that the DBA needs to watch out for, but the even scarier quirks of common humanity. A large number of the issues that occur in the enterprise happen just randomly or even just one time ever in a spurious manner, like in the case where a person decided to download the entire MSDN library of software, cross join every non-indexed billion row table together, and simultaneously stream the HD feed of 5 different sporting events, making the network access slow while the corporate online sales just started. The decent DBA team, like the going, gets tough under such circumstances. They spring into action, checking all of the sources of active information, observes the issue is no longer happening now, figures that either it wasn’t the database’s fault and that the reboot of the whatever device on the network fixed the problem. This sort of reactive support is good, and will be the initial reaction of even excellent DBAs, but it is not the end of the story if you really want to know what happened and avoid getting called again when it isn’t even your fault.
When fires start raging within the corporate software forest, the DBA’s instinct is to actively find a way to douse the flames and get back to having no one in the company have any idea who they are. Even better for them is to find a way of killing a potential problem while the fires are small, long before they can be classified as raging. The observant DBA will have already been monitoring the server environment for months in advance. Most troubles, such as disk space and security intrusions, can be predicted and dealt with by alerting systems, whereas other trouble can come out of the blue and requires a skill of observing ongoing conditions and noticing inexplicable changes that could signal an emerging problem. You can’t automate the DBA, because the bankable skill of a DBA is in detecting the early signs of unexpected problems, and working out how to deal with them before anyone else notices them.
To achieve this, the DBA will check the situation as it is currently happening, and in many cases is likely to have been the person who submitted the problem to the level 1 support person in the first place, just to let the support team know of impending issues (always well received, I tell you what!). Database and host computer settings, configurations, and even critical data might be profiled and captured for later comparisons. He’ll use Monitoring tools, built-in, commercial (Not to be too crassly commercial or anything, but there is one such tool is SQL Monitor) and lots of homebrew monitoring tools to monitor for problems and changes in the server environment.
You will know that you have it right when a support call comes in and you can look at your monitoring tools and quickly respond that “response time is well within the normal range, the query that supports the failing interface works perfectly and has actually only been called 67% as often as normal, so I am more than willing to help diagnose the problem, but it isn’t the database server’s fault and is probably a client or networking slowdown causing the interface to be used less frequently than normal.” And that is the best thing for any DBA to observe…