It is an awkward feeling. You’ve just delivered a database application that
seems to be working fine in production, and you just run a few checks on it. You discover that there is a potential bug
that, out of sheer good chance, hasn’t kicked in to produce an error; but it lurks, like a smoking bomb. Worse, maybe
you find that the bug has started its evil work of corrupting the data, but in ways that nobody has, so far detected.
You investigate, and find the damage. You are somehow going to have to repair it.
Yes, it still very occasionally happens to me. It is not a nice feeling, and I
do anything I can to prevent it happening. That’s why I’m interested in SQL code smells. SQL Code Smells aren’t
necessarily bad practices, but just show you where to focus your attention when checking an application.
Sometimes with databases the bugs can be subtle. SQL is rather like HTML: the
language does its best to try to carry out your wishes, rather than to be picky about your bugs. Most of the time, this
is a great benefit, but not always. One particular place where this can be detrimental is where you have implicit conversion between different data types. Most of the time it is completely harmless but we’re
concerned about the occasional time it isn’t. Let’s give an example: String truncation. Let’s give another even
more frightening one, rounding errors on assignment to a number of different precision. Each requires a blog-post to
explain in detail and I’m not now going to try. Just remember that it is not always a good idea to assign data to
variables, parameters or even columns when they aren’t the same datatype, especially if you are relying on implicit
conversion to work its magic.For details of the problem and the consequences, see here: SR0014: Data loss might occur when casting from {Type1} to {Type2} . For any experienced Database Developer, this is a more frightening read than a Vampire Story.
This is why one of the SQL Code Smells that makes me edgy, in my own or other
peoples’ code, is to see parameters, variables and columns that have the same names and different datatypes. Whereas
quite a lot of this is perfectly normal and natural, you need to check in case one of two things have gone wrong. Either
sloppy naming, or mixed datatypes. Sure it is hard to remember whether you decided that the length of a log entry was 80
or 100 characters long, or the precision of a number. That is why a little check like this I’m going to show you is
excellent for tidying up your code before you check it back into source Control!
1/ Checking Parameters only
If you were just going to check parameters, you might just do this. It simply
groups all the parameters, either input or output, of all the routines (e.g. stored procedures or functions) by their
name and checks to see, in the HAVING clause, whether their data types are all the same. If not, it lists all the
examples and their origin (the routine)
Even this little check can occasionally be scarily revealing.
;WITH userParameter AS ( SELECT c.NAME AS ParameterName, OBJECT_SCHEMA_NAME(c.object_ID) + '.' + OBJECT_NAME(c.object_ID) AS ObjectName, t.name + ' ' + CASE --we may have to put in the length WHEN t.name IN ('char', 'varchar', 'nchar', 'nvarchar') THEN '(' + CASE WHEN c.max_length = -1 THEN 'MAX' ELSE CONVERT(VARCHAR(4), CASE WHEN t.name IN ('nchar', 'nvarchar') THEN c.max_length / 2 ELSE c.max_length END) END + ')' WHEN t.name IN ('decimal', 'numeric') THEN '(' + CONVERT(VARCHAR(4), c.precision) + ',' + CONVERT(VARCHAR(4), c.Scale) + ')' ELSE '' END --we've done with putting in the length + CASE WHEN XML_collection_ID <> 0 THEN --deal with object schema names '(' + CASE WHEN is_XML_Document = 1 THEN 'DOCUMENT ' ELSE 'CONTENT ' END + COALESCE( (SELECT QUOTENAME(ss.name) + '.' + QUOTENAME(sc.name) FROM sys.xml_schema_collections sc INNER JOIN Sys.Schemas ss ON sc.schema_ID = ss.schema_ID WHERE sc.xml_collection_ID = c.XML_collection_ID),'NULL') + ')' ELSE '' END AS [DataType] FROM sys.parameters c INNER JOIN sys.types t ON c.user_Type_ID = t.user_Type_ID WHERE OBJECT_SCHEMA_NAME(c.object_ID) <> 'sys' AND parameter_id>0)SELECT CONVERT(CHAR(80),objectName+'.'+ParameterName),DataType FROM UserParameterWHERE ParameterName IN (SELECT ParameterName FROM UserParameter GROUP BY ParameterName HAVING MIN(Datatype)<>MAX(DataType))ORDER BY ParameterName
so, in a very small example here, we have a @ClosingDelimiter variable that is only CHAR(1) when, by the looks of it, it should be up to ten characters long, or even worse, a function that should be a char(1) and seems to let in a string of ten characters. Worth investigating. Then we have a
@Comment variable that can't decide whether it is a VARCHAR(2000) or a VARCHAR(MAX)
2/ Columns and Parameters
Actually, once we’ve cleared up the mess we’ve made of our parameter-naming in
the database we’re inspecting, we’re going to be more interested in listing both columns and parameters. We can do this
by modifying the routine to list columns as well as parameters. Because of the slight complexity of creating the string
version of the datatypes, we will create a fake table of both columns and parameters so that they can both be processed
the same way. After all, we want the datatypes to match
Unfortunately, parameters do not expose all the attributes we are interested
in, such as whether they are nullable (oh yes, subtle bugs happen if this isn’t consistent for a datatype). We’ll have
to leave them out for this check.
Voila! A slight modification of the first routine
;WITH userObject AS ( SELECT Name AS DataName,--the actual name of the parameter or column ('@' removed) --and the qualified object name of the routine OBJECT_SCHEMA_NAME(ObjectID) + '.' + OBJECT_NAME(ObjectID) AS ObjectName, --now the harder bit: the definition of the datatype. TypeName + ' ' + CASE --we may have to put in the length. e.g. CHAR (10) WHEN TypeName IN ('char', 'varchar', 'nchar', 'nvarchar') THEN '(' + CASE WHEN MaxLength = -1 THEN 'MAX' ELSE CONVERT(VARCHAR(4), CASE WHEN TypeName IN ('nchar', 'nvarchar') THEN MaxLength / 2 ELSE MaxLength END) END + ')' WHEN TypeName IN ('decimal', 'numeric')--a BCD number! THEN '(' + CONVERT(VARCHAR(4), Precision) + ',' + CONVERT(VARCHAR(4), Scale) + ')' ELSE '' END --we've done with putting in the length + CASE WHEN XML_collection_ID <> 0 --tush tush. XML THEN --deal with object schema names '(' + CASE WHEN is_XML_Document = 1 THEN 'DOCUMENT ' ELSE 'CONTENT ' END + COALESCE( (SELECT TOP 1 QUOTENAME(ss.name) + '.' + QUOTENAME(sc.Name) FROM sys.xml_schema_collections sc INNER JOIN Sys.Schemas ss ON sc.schema_ID = ss.schema_ID WHERE sc.xml_collection_ID = XML_collection_ID),'NULL') + ')' ELSE '' END AS [DataType], DataObjectType FROM (Select t.name AS TypeName, REPLACE(c.name,'@','') AS Name, c.max_length AS MaxLength, c.precision AS [Precision], c.scale AS [Scale], c.[Object_id] AS ObjectID, XML_collection_ID, is_XML_Document,'P' AS DataobjectType FROM sys.parameters c INNER JOIN sys.types t ON c.user_Type_ID = t.user_Type_ID AND parameter_id>0 UNION all Select t.name AS TypeName, c.name AS Name, c.max_length AS MaxLength, c.precision AS [Precision], c.scale AS [Scale], c.[Object_id] AS ObjectID, XML_collection_ID,is_XML_Document, 'C' AS DataobjectType FROM sys.columns c INNER JOIN sys.types t ON c.user_Type_ID = t.user_Type_ID WHERE OBJECT_SCHEMA_NAME(c.object_ID) <> 'sys' )f)SELECT CONVERT(CHAR(80),objectName+'.' + CASE WHEN DataobjectType ='P' THEN '@' ELSE '' END + DataName),DataType FROM UserObjectWHERE DataName IN (SELECT DataName FROM UserObject GROUP BY DataName HAVING MIN(Datatype)<>MAX(DataType))ORDER BY DataName Hmm. I can tell you I found quite a few minor issues with the various
tabases I tested this on, and found some potential bugs that really leap out at you from the results. Here is the
start of the result for AdventureWorks. Yes, AccountNumber is, for some reason, a Varchar(10) in the Customer table.
Hmm. odd. Why is a city fifty characters long in that view? The idea of the description of a colour being 256
characters long seems over-ambitious. Go down the list and you'll spot other mistakes. There are no bugs, but just mess.
We started out with a listing to examine parameters, then we mixed parameters
and columns. Our last listing is for a slightly more in-depth look at table columns. You’ll notice that we’ve
delibarately removed the indication of whether a column is persisted, or is an identity column because that gives us
false positives for our code smells. If you just want to browse your metadata for other reasons (and it can quite help
in some circumstances) then uncomment them!
;WITH userColumns AS ( SELECT c.NAME AS columnName, OBJECT_SCHEMA_NAME(c.object_ID) + '.' + OBJECT_NAME(c.object_ID) AS ObjectName, REPLACE(t.name + ' ' + CASE WHEN is_computed = 1 THEN ' AS ' + --do DDL for a computed column (SELECT definition FROM sys.computed_columns cc WHERE cc.object_id = c.object_id AND cc.column_ID = c.column_ID) --we may have to put in the length WHEN t.Name IN ('char', 'varchar', 'nchar', 'nvarchar') THEN '(' + CASE WHEN c.Max_Length = -1 THEN 'MAX' ELSE CONVERT(VARCHAR(4), CASE WHEN t.Name IN ('nchar', 'nvarchar') THEN c.Max_Length / 2 ELSE c.Max_Length END) END + ')' WHEN t.name IN ('decimal', 'numeric') THEN '(' + CONVERT(VARCHAR(4), c.precision) + ',' + CONVERT(VARCHAR(4), c.Scale) + ')' ELSE '' END + CASE WHEN c.is_rowguidcol = 1 THEN ' ROWGUIDCOL' ELSE '' END + CASE WHEN XML_collection_ID <> 0 THEN --deal with object schema names '(' + CASE WHEN is_XML_Document = 1 THEN 'DOCUMENT ' ELSE 'CONTENT ' END + COALESCE((SELECT QUOTENAME(ss.name) + '.' + QUOTENAME(sc.name) FROM sys.xml_schema_collections sc INNER JOIN Sys.Schemas ss ON sc.schema_ID = ss.schema_ID WHERE sc.xml_collection_ID = c.XML_collection_ID), 'NULL') + ')' ELSE '' END + CASE WHEN is_identity = 1 THEN CASE WHEN OBJECTPROPERTY(object_id, 'IsUserTable') = 1 AND COLUMNPROPERTY(object_id, c.name, 'IsIDNotForRepl') = 0 AND OBJECTPROPERTY(object_id, 'IsMSShipped') = 0 THEN '' ELSE ' NOT FOR REPLICATION ' END ELSE '' END + CASE WHEN c.is_nullable = 0 THEN ' NOT NULL' ELSE ' NULL' END + CASE WHEN c.default_object_id <> 0 THEN ' DEFAULT ' + object_Definition(c.default_object_id) ELSE '' END + CASE WHEN c.collation_name IS NULL THEN '' WHEN c.collation_name <> (SELECT collation_name FROM sys.databases WHERE name = DB_NAME()) COLLATE Latin1_General_CI_AS THEN COALESCE(' COLLATE ' + c.collation_name, '') ELSE '' END,' ',' ') AS [DataType]FROM sys.columns c INNER JOIN sys.types t ON c.user_Type_ID = t.user_Type_ID WHERE OBJECT_SCHEMA_NAME(c.object_ID) <> 'sys')SELECT CONVERT(CHAR(80),objectName+'.'+columnName),DataType FROM UserColumnsWHERE columnName IN (SELECT columnName FROM UserColumns
GROUP BY columnName
HAVING MIN(Datatype)<>MAX(DataType))ORDER BY columnName If you take a look down the results against Adventureworks, you'll see once again that there are things to
investigate, mostly, in the illustration, discrepancies between null and non-null datatypes
So I here you ask, what about temporary variables
within routines? If ever there was a source of elusive bugs, you'll find it there. Sadly, these temporary variables are
not stored in the metadata so we'll have to find a more subtle way of flushing these out, and that will, I'm afraid,
have to wait!