Keeping an Eye on Your Storage
- by Fatherjack
There are plenty of resources that advise you about looking for signs that your storage hardware is having problems. SQL Server Alerts for 823, 824 and 825 are covered here by Paul Randall of SQL Skills: http://www.sqlskills.com/blogs/paul/a-little-known-sign-of-impending-doom-error-825/ and here by me: https://www.simple-talk.com/blogs/2011/06/27/alerts-are-good-arent-they/.
Now until very recently I wasn’t aware that there was a different way to track the 823 + 824 errors. It was by complete chance that I happened to be searching about in the msdb database when I found the suspect_pages table. Running a query against it I got zero rows.
This, as it turns out is a good thing. Highlighting the table name and pressing F1 got me nowhere – Is it just me or does Books Online fail to load properly for no obvious reason sometimes? So I typed the table name into the search bar and got my local version of http://msdn.microsoft.com/en-us/library/ms174425.aspx. From that we get the following description:
Contains one row per page that failed with a minor 823 error or an 824 error. Pages are listed in this table because they are suspected of being bad, but they might actually be fine. When a suspect page is repaired, its status is updated in the event_type column.
So, in the table we would, on healthy hardware, expect to see zero rows but on disks that are having problems the event_type column would show us what is going on. Where there are suspect pages on the disk the rows would have an event_type value of 1, 2 or 3, where those suspect pages have been restored, repaired or deallocated by DBCC then the value would be 4, 5 or 7.
Having this table means that we can set up SQL Monitor to check the status of our hardware as we can create a custom metric based on the query below:
USE [msdb]
go
SELECT COUNT(*)
FROM [dbo].[suspect_pages] AS sp
All we need to do is set the metric to collect this value and set an alert to email when the value is not 1 and we are then able to let SQL Monitor take care of our storage.
Note that the suspect_pages table does not have any updates concerning Error 825 which the links at the top of the page cover in more detail. I would suggest that you set SQL Monitor to alert on the suspect_pages table in addition to other taking other measures to look after your storage hardware and not have it as your only precaution.
Microsoft actually pass ownership and administration of the suspect_pages table over to the database administrator (Manage the suspect_pages Table (SQL Server)) and in a surprising move (to me at least) advise DBAs to actively update and archive data in it. The table will only ever contain a maximum of 1000 rows and once full, new rows will not be added. Keeping an eye on this table is pretty important, although In my opinion, if you get to 1000 rows in this table and are not already waiting for new disks to be added to your server you are doing something wrong but if you have 1000 rows in there then you need to move data out quickly because you may be missing some important events on your server.