SQL Monitor’s data repository: Alerts
Posted
by Chris Lambrou
on Simple Talk
See other posts from Simple Talk
or by Chris Lambrou
Published on Tue, 11 Sep 2012 08:42:09 +0000
Indexed on
2012/09/11
9:46 UTC
Read the original article
Hit count: 798
Uncategorized
In my previous post, I introduced the SQL Monitor data repository, and described how the monitored objects are stored in a hierarchy in the data schema, in a series of tables with a _Keys suffix. In this post I had planned to describe how the actual data for the monitored objects is stored in corresponding tables with _StableSamples and _UnstableSamples suffixes. However, I’m going to postpone that until my next post, as I’ve had a request from a SQL Monitor user to explain how alerts are stored.
In the SQL Monitor data repository, alerts are stored in tables belonging to the alert schema, which contains the following five tables:
- alert.Alert
- alert.Alert_Cleared
- alert.Alert_Comment
- alert.Alert_Severity
- alert.Alert_Type
In this post, I’m only going to cover the alert.Alert and alert.Alert_Type tables. I may cover the other three tables in a later post. The most important table in this schema is alert.Alert, as each row in this table corresponds to a single alert. So let’s have a look at it.
SELECT TOP 100 AlertId,
AlertType,
TargetObject,
[Read],
SubType
FROM alert.Alert
ORDER BY AlertId DESC;
AlertId | AlertType | TargetObject | Read | SubType | |
---|---|---|---|---|---|
1 | 65550 | 39 | 7:Cluster,1,4:Name,s29:srp-mr03.testnet.red-gate.com,9:SqlServer,1,4:Name,s0:, | 1 | 0 |
2 | 65549 | 38 | 7:Cluster,1,4:Name,s29:srp-mr03.testnet.red-gate.com,7:Machine,1,4:Name,s0:, | 1 | 0 |
3 | 65548 | 18 | 7:Cluster,1,4:Name,s7:granger,9:SqlServer,1,4:Name,s0:,8:Database,1,4:Name,s15:FavouriteThings, | 0 | 0 |
4 | 65547 | 15 | 7:Cluster,1,4:Name,s7:granger,9:SqlServer,1,4:Name,s0:,8:Database,1,4:Name,s15:FavouriteThings, | 0 | 0 |
5 | 65546 | 14 | 7:Cluster,1,4:Name,s7:granger,9:SqlServer,1,4:Name,s0:,8:Database,1,4:Name,s15:FavouriteThings, | 0 | 0 |
6 | 65545 | 18 | 7:Cluster,1,4:Name,s7:granger,9:SqlServer,1,4:Name,s0:,8:Database,1,4:Name,s14:SqlMonitorData, | 0 | 0 |
7 | 65544 | 15 | 7:Cluster,1,4:Name,s7:granger,9:SqlServer,1,4:Name,s0:,8:Database,1,4:Name,s14:SqlMonitorData, | 0 | 0 |
8 | 65543 | 14 | 7:Cluster,1,4:Name,s7:granger,9:SqlServer,1,4:Name,s0:,8:Database,1,4:Name,s14:SqlMonitorData, | 0 | 0 |
9 | 65542 | 18 | 7:Cluster,1,4:Name,s7:granger,9:SqlServer,1,4:Name,s0:,8:Database,1,4:Name,s4:msdb, | 0 | 0 |
10 | 65541 | 14 | 7:Cluster,1,4:Name,s7:granger,9:SqlServer,1,4:Name,s0:,8:Database,1,4:Name,s4:msdb, | 0 | 0 |
11 | … |
So what are we seeing here, then? Well, AlertId is an auto-incrementing identity column, so ORDER BY AlertId DESC ensures that we see the most recent alerts first. AlertType indicates the type of each alert, such as Job failed (6), Backup overdue (14) or Long-running query (12). The TargetObject column indicates which monitored object the alert is associated with. The Read column acts as a flag to indicate whether or not the alert has been read. And finally the SubType column is used in the case of a Custom metric (40) alert, to indicate which custom metric the alert pertains to.
Okay, now lets look at some of those columns in more detail. The AlertType column is an easy one to start with, and it brings use nicely to the next table, data.Alert_Type. Let’s have a look at what’s in this table:
SELECT AlertType,
Event,
Monitoring,
Name,
Description
FROM alert.Alert_Type
ORDER BY AlertType;
AlertType | Event | Monitoring | Name | Description | |
---|---|---|---|---|---|
1 | 1 | 0 | 0 | Processor utilization | Processor utilization (CPU) on a host machine stays above a threshold percentage for longer than a specified duration |
2 | 2 | 1 | 0 | SQL Server error log entry | An error is written to the SQL Server error log with a severity level above a specified value. |
3 | 3 | 1 | 0 | Cluster failover | The active cluster node fails, causing the SQL Server instance to switch nodes. |
4 | 4 | 1 | 0 | Deadlock | SQL deadlock occurs. |
5 | 5 | 0 | 0 | Processor under-utilization | Processor utilization (CPU) on a host machine remains below a threshold percentage for longer than a specified duration |
6 | 6 | 1 | 0 | Job failed | A job does not complete successfully (the job returns an error code). |
7 | 7 | 0 | 0 | Machine unreachable | Host machine (Windows server) cannot be contacted on the network. |
8 | 8 | 0 | 0 | SQL Server instance unreachable | The SQL Server instance is not running or cannot be contacted on the network. |
9 | 9 | 0 | 0 | Disk space | Disk space used on a logical disk drive is above a defined threshold for longer than a specified duration. |
10 | 10 | 0 | 0 | Physical memory | Physical memory (RAM) used on the host machine stays above a threshold percentage for longer than a specified duration. |
11 | 11 | 0 | 0 | Blocked process | SQL process is blocked for longer than a specified duration. |
12 | 12 | 0 | 0 | Long-running query | A SQL query runs for longer than a specified duration. |
13 | 14 | 0 | 0 | Backup overdue | No full backup exists, or the last full backup is older than a specified time. |
14 | 15 | 0 | 0 | Log backup overdue | No log backup exists, or the last log backup is older than a specified time. |
15 | 16 | 0 | 0 | Database unavailable | Database changes from Online to any other state. |
16 | 17 | 0 | 0 | Page verification | Torn Page Detection or Page Checksum is not enabled for a database. |
17 | 18 | 0 | 0 | Integrity check overdue | No entry for an integrity check (DBCC DBINFO returns no date for dbi_dbccLastKnownGood field), or the last check is older than a specified time. |
18 | 19 | 0 | 0 | Fragmented indexes | Fragmentation level of one or more indexes is above a threshold percentage. |
19 | 24 | 0 | 0 | Job duration unusual | The duration of a SQL job duration deviates from its baseline duration by more than a threshold percentage. |
20 | 25 | 0 | 1 | Clock skew | System clock time on the Base Monitor computer differs from the system clock time on a monitored SQL Server host machine by a specified number of seconds. |
21 | 27 | 0 | 0 | SQL Server Agent Service status | The SQL Server Agent Service status matches the status specified. |
22 | 28 | 0 | 0 | SQL Server Reporting Service status | The SQL Server Reporting Service status matches the status specified. |
23 | 29 | 0 | 0 | SQL Server Full Text Search Service status | The SQL Server Full Text Search Service status matches the status specified. |
24 | 30 | 0 | 0 | SQL Server Analysis Service status | The SQL Server Analysis Service status matches the status specified. |
25 | 31 | 0 | 0 | SQL Server Integration Service status | The SQL Server Integration Service status matches the status specified. |
26 | 33 | 0 | 0 | SQL Server Browser Service status | The SQL Server Browser Service status matches the status specified. |
27 | 34 | 0 | 0 | SQL Server VSS Writer Service status | The SQL Server VSS Writer status matches the status specified. |
28 | 35 | 0 | 1 | Deadlock trace flag disabled | The monitored SQL Server’s trace flag cannot be enabled. |
29 | 36 | 0 | 0 | Monitoring stopped (host machine credentials) | SQL Monitor cannot contact the host machine because authentication failed. |
30 | 37 | 0 | 0 | Monitoring stopped (SQL Server credentials) | SQL Monitor cannot contact the SQL Server instance because authentication failed. |
31 | 38 | 0 | 0 | Monitoring error (host machine data collection) | SQL Monitor cannot collect data from the host machine. |
32 | 39 | 0 | 0 | Monitoring error (SQL Server data collection) | SQL Monitor cannot collect data from the SQL Server instance. |
33 | 40 | 0 | 0 | Custom metric | The custom metric value has passed an alert threshold. |
34 | 41 | 0 | 0 | Custom metric collection error | SQL Monitor cannot collect custom metric data from the target object. |
Basically, alert.Alert_Type is just a big reference table containing information about the 34 different alert types supported by SQL Monitor (note that the largest id is 41, not 34 – some alert types have been retired since SQL Monitor was first developed). The Name and Description columns are self evident, and I’m going to skip over the Event and Monitoring columns as they’re not very interesting. The AlertId column is the primary key, and is referenced by AlertId in the alert.Alert table. As such, we can rewrite our earlier query to join these two tables, in order to provide a more readable view of the alerts:
SELECT TOP 100 AlertId,
Name,
TargetObject,
[Read],
SubType
FROM alert.Alert a
JOIN alert.Alert_Type at ON a.AlertType = at.AlertType
ORDER BY AlertId DESC;
AlertId | Name | TargetObject | Read | SubType | |
---|---|---|---|---|---|
1 | 65550 | Monitoring error (SQL Server data collection) | 7:Cluster,1,4:Name,s29:srp-mr03.testnet.red-gate.com,9:SqlServer,1,4:Name,s0:, | 0 | 0 |
2 | 65549 | Monitoring error (host machine data collection) | 7:Cluster,1,4:Name,s29:srp-mr03.testnet.red-gate.com,7:Machine,1,4:Name,s0:, | 0 | 0 |
3 | 65548 | Integrity check overdue | 7:Cluster,1,4:Name,s7:granger,9:SqlServer,1,4:Name,s0:,8:Database,1,4:Name,s15:FavouriteThings, | 0 | 0 |
4 | 65547 | Log backup overdue | 7:Cluster,1,4:Name,s7:granger,9:SqlServer,1,4:Name,s0:,8:Database,1,4:Name,s15:FavouriteThings, | 0 | 0 |
5 | 65546 | Backup overdue | 7:Cluster,1,4:Name,s7:granger,9:SqlServer,1,4:Name,s0:,8:Database,1,4:Name,s15:FavouriteThings, | 0 | 0 |
6 | 65545 | Integrity check overdue | 7:Cluster,1,4:Name,s7:granger,9:SqlServer,1,4:Name,s0:,8:Database,1,4:Name,s14:SqlMonitorData, | 0 | 0 |
7 | 65544 | Log backup overdue | 7:Cluster,1,4:Name,s7:granger,9:SqlServer,1,4:Name,s0:,8:Database,1,4:Name,s14:SqlMonitorData, | 0 | 0 |
8 | 65543 | Backup overdue | 7:Cluster,1,4:Name,s7:granger,9:SqlServer,1,4:Name,s0:,8:Database,1,4:Name,s14:SqlMonitorData, | 0 | 0 |
9 | 65542 | Integrity check overdue | 7:Cluster,1,4:Name,s7:granger,9:SqlServer,1,4:Name,s0:,8:Database,1,4:Name,s4:msdb, | 0 | 0 |
10 | 65541 | Backup overdue | 7:Cluster,1,4:Name,s7:granger,9:SqlServer,1,4:Name,s0:,8:Database,1,4:Name,s4:msdb, | 0 | 0 |
Okay, the next column to discuss in the alert.Alert table is TargetObject. Oh boy, this one’s a bit tricky! The TargetObject of an alert is a serialized string representation of the position in the monitored object hierarchy of the object to which the alert pertains. The serialization format is somewhat convenient for parsing in the C# source code of SQL Monitor, and has some helpful characteristics, but it’s probably very awkward to manipulate in T-SQL.
I could document the serialization format here, but it would be very dry reading, so perhaps it’s best to consider an example from the table above. Have a look at the alert with an AlertID of 65543. It’s a Backup overdue alert for the SqlMonitorData database running on the default instance of granger, my laptop. Each different alert type is associated with a specific type of monitored object in the object hierarchy (I described the hierarchy in my previous post). The Backup overdue alert is associated with databases, whose position in the object hierarchy is root → Cluster → SqlServer → Database. The TargetObject value identifies the target object by specifying the key properties at each level in the hierarchy, thus:
- Cluster: Name = "granger"
- SqlServer: Name = "" (an empty string, denoting the default instance)
- Database: Name = "SqlMonitorData"
Well, look at the actual TargetObject value for this alert: "7:Cluster,1,4:Name,s7:granger,9:SqlServer,1,4:Name,s0:,8:Database,1,4:Name,s14:SqlMonitorData,". It is indeed composed of three parts, one for each level in the hierarchy:
- Cluster: "7:Cluster,1,4:Name,s7:granger,"
- SqlServer: "9:SqlServer,1,4:Name,s0:,"
- Database: "8:Database,1,4:Name,s14:SqlMonitorData,"
Each part is handled in exactly the same way, so let’s concentrate on the first part, "7:Cluster,1,4:Name,s7:granger,". It comprises the following:
- "7:Cluster," – This identifies the level in the hierarchy.
- "1," – This indicates how many different key properties there are to uniquely identify a cluster (we saw in my last post that each cluster is identified by a single property, its Name).
- "4:Name,s14:SqlMonitorData," – This represents the Name property, and its corresponding value, SqlMonitorData. It’s split up like this:
- "4:Name," – Indicates the name of the key property.
- "s" – Indicates the type of the key property, in this case, it’s a string.
- "14:SqlMonitorData," – Indicates the value of the property.
At this point, you might be wondering about the format of some of these strings. Why is the string "Cluster" stored as "7:Cluster,"? Well an encoding scheme is used, which consists of the following:
- "7" – This is the length of the string "Cluster"
- ":" – This is a delimiter between the length of the string and the actual string’s contents.
- "Cluster" – This is the string itself. 7 characters.
- "," – This is a final terminating character that indicates the end of the encoded string.
You can see that "4:Name,", "8:Database," and "14:SqlMonitorData," also conform to the same encoding scheme.
In the example above, the "s" character is used to indicate that the value of the Name property is a string. If you explore the TargetObject property of alerts in your own SQL Monitor data repository, you might find other characters used for other non-string key property values. The different value types you might possibly encounter are as follows:
- "I" – Denotes a bigint value. For example, "I65432,".
- "g" – Denotes a GUID value. For example, "g32116732-63ae-4ab5-bd34-7dfdfb084c18,".
- "d" – Denotes a datetime value. For example, "d634815384796832438,". The value is stored as a bigint, rather than a native SQL datetime value. I’ll describe how datetime values are handled in the SQL Monitor data repostory in a future post.
I suggest you have a look at the alerts in your own SQL Monitor data repository for further examples, so you can see how the TargetObject values are composed for each of the different types of alert. Let me give one further example, though, that represents a Custom metric alert, as this will help in describing the final column of interest in the alert.Alert table, SubType. Let me show you the alert I’m interested in:
SELECT AlertId,
a.AlertType,
Name,
TargetObject,
[Read],
SubType
FROM alert.Alert a
JOIN alert.Alert_Type at ON a.AlertType = at.AlertType
WHERE AlertId = 65769;
AlertId | AlertType | Name | TargetObject | Read | SubType | |
---|---|---|---|---|---|---|
1 | 65769 | 40 | Custom metric | 7:Cluster,1,4:Name,s7:granger,9:SqlServer,1,4:Name,s0:,8:Database,1,4:Name,s6:master,12:CustomMetric,1,8:MetricId,I2, | 0 | 2 |
An AlertType value of 40 corresponds to the Custom metric alert type. The Name taken from the alert.Alert_Type table is simply Custom metric, but this doesn’t tell us anything about the specific custom metric that this alert pertains to. That’s where the SubType value comes in. For custom metric alerts, this provides us with the Id of the specific custom alert definition that can be found in the settings.CustomAlertDefinitions table. I don’t really want to delve into custom alert definitions yet (maybe in a later post), but an extra join in the previous query shows us that this alert pertains to the CPU pressure (avg runnable task count) custom metric alert.
SELECT AlertId,
a.AlertType,
at.Name,
cad.Name AS CustomAlertName,
TargetObject,
[Read],
SubType
FROM alert.Alert a
JOIN alert.Alert_Type at ON a.AlertType = at.AlertType
JOIN settings.CustomAlertDefinitions cad ON a.SubType = cad.Id
WHERE AlertId = 65769;
AlertId | AlertType | Name | CustomAlertName | TargetObject | Read | SubType | |
---|---|---|---|---|---|---|---|
1 | 65769 | 40 | Custom metric | CPU pressure (avg runnable task count) | 7:Cluster,1,4:Name,s7:granger,9:SqlServer,1,4:Name,s0:,8:Database,1,4:Name,s6:master,12:CustomMetric,1,8:MetricId,I2, | 0 | 2 |
The TargetObject value in this case breaks down like this:
- "7:Cluster,1,4:Name,s7:granger," – Cluster named "granger".
- "9:SqlServer,1,4:Name,s0:," – SqlServer named "" (the default instance).
- "8:Database,1,4:Name,s6:master," – Database named "master".
- "12:CustomMetric,1,8:MetricId,I2," – Custom metric with an Id of 2.
Note that the hierarchy for a custom metric is slightly different compared to the earlier Backup overdue alert. It’s root → Cluster → SqlServer → Database → CustomMetric. Also notice that, unlike Cluster, SqlServer and Database, the key property for CustomMetric is called MetricId (not Name), and the value is a bigint (not a string).
Finally, delving into the custom metric tables is beyond the scope of this post, but for the sake of avoiding any future confusion, I’d like to point out that whilst the SubType references a custom alert definition, the MetricID value embedded in the TargetObject value references a custom metric definition. Although in this case both the custom metric definition and custom alert definition share the same Id value of 2, this is not generally the case.
Okay, that’s enough for now, not least because as I’m typing this, it’s almost 2am, I have to go to work tomorrow, and my alarm is set for 6am – eek! In my next post, I’ll either cover the remaining three tables in the alert schema, or I’ll delve into the way SQL Monitor stores its monitoring data, as I’d originally planned to cover in this post.
© Simple Talk or respective owner