For several years, I was responsible for the smooth running
of a large number of enterprise database servers. We ran a network monitoring
tool that was primitive by today’s standards but which performed the useful
function of polling every system, including all the Servers in my charge. It
ran a configurable script for each service that you needed to monitor that was
merely required to return one of a number of integer values. These integer
values represented the pain level of the service, from 10 (“hurtin’ real bad”)
to 1 (“Things is great”). Not only could you program the visual appearance of
each server on the network diagram according to the value of the integer, but
you could even opt to run a sound file. Very soon, we had a large TFT Screen,
high on the wall of the server room, with every server represented by an icon,
and a speaker next to it that would give out a series of grunts, groans,
snores, shrieks and funeral marches, depending on the problem. One glance at
the display, and you could dive in with iSQL/QA/SSMS and check what was going
on with your favourite diagnostic tools. If you saw a server icon burst into
flames on the screen or droop like a jelly, you dropped your mug of coffee to
do it.
It was real fun, but I remember it more for the huge
difference it made to have that real-time visibility into how your servers are
performing. The management soon stopped making jokes about the real reason we
wanted the TFT screen. (It rendered DVDs beautifully they said; particularly
flesh-tints). If you are instantly alerted when things start to go wrong, then
there was a good chance you could fix it before being alerted to the problem by
the users of the system.
There is a world of difference between this sort of tool,
one that gives whoever is ‘on watch’ in the server room the first warning of a
potential problem on one of any number of servers, and the breed of tool that attempts
to provide some sort of prosthetic DBA Brain. I like to get the early warning, to
get the right information to help to diagnose a problem: No auto-fix, but just
the information. I prefer to leave the task of ascertaining the exact cause of
a problem to my own routines, custom code, intuition and forensic instincts. A
simulated aircraft cockpit doesn’t do anything for me, especially before I know
where I should be flying.
Time has moved on, and that TFT screen is now, with SQL Monitor,
an iPad or any other mobile or static device that can support a browser. Rather
than trying to reproduce the conceptual topology of the servers, it lists them
in their groups so as to give a display that scales with the increasing number
of databases you monitor. It gives the
history of the major events and trends for the servers. It gives the icons and
colours that you can spot out of the corner of your eye, but goes on to give
you just enough information in drill-down to give you a much clearer idea of
where to look with your DBA tools and routines. It doesn't swamp you with
information.
Whereas a few server and database-level problems are pretty
easily fixed, others depend on judgement and experience to sort out. Although the idea of an application that
automates the bulk of a DBA’s skills is attractive to many, I can’t see it
happening soon. SQL Server’s complexity increases faster than the panaceas can
be created. In the meantime, I believe that the best way of helping DBAs
is to make the monitoring process as simple and effective as possible, and provide the right sort of detail and
‘evidence’ to allow them to decide on the fix. In the end, it is still down to
the skill of the DBA.