Sniffing out SQL Code Smells: Inconsistent use of Symbolic names and Datatypes
Posted
by Phil Factor
on Simple Talk
See other posts from Simple Talk
or by Phil Factor
Published on Fri, 10 Dec 2010 01:01:00 GMT
Indexed on
2010/12/16
4:16 UTC
Read the original article
Hit count: 417
It is an awkward feeling. You’ve just delivered a database application that seems to be working fine in production, and you just run a few checks on it. You discover that there is a potential bug that, out of sheer good chance, hasn’t kicked in to produce an error; but it lurks, like a smoking bomb. Worse, maybe you find that the bug has started its evil work of corrupting the data, but in ways that nobody has, so far detected. You investigate, and find the damage. You are somehow going to have to repair it.
Yes, it still very occasionally happens to me. It is not a nice feeling, and I do anything I can to prevent it happening. That’s why I’m interested in SQL code smells. SQL Code Smells aren’t necessarily bad practices, but just show you where to focus your attention when checking an application.
Sometimes with databases the bugs can be subtle. SQL is rather like HTML: the language does its best to try to carry out your wishes, rather than to be picky about your bugs. Most of the time, this is a great benefit, but not always. One particular place where this can be detrimental is where you have implicit conversion between different data types. Most of the time it is completely harmless but we’re concerned about the occasional time it isn’t. Let’s give an example: String truncation. Let’s give another even more frightening one, rounding errors on assignment to a number of different precision. Each requires a blog-post to explain in detail and I’m not now going to try. Just remember that it is not always a good idea to assign data to variables, parameters or even columns when they aren’t the same datatype, especially if you are relying on implicit conversion to work its magic.For details of the problem and the consequences, see here: SR0014: Data loss might occur when casting from {Type1} to {Type2} . For any experienced Database Developer, this is a more frightening read than a Vampire Story.
This is why one of the SQL Code Smells that makes me edgy, in my own or other peoples’ code, is to see parameters, variables and columns that have the same names and different datatypes. Whereas quite a lot of this is perfectly normal and natural, you need to check in case one of two things have gone wrong. Either sloppy naming, or mixed datatypes. Sure it is hard to remember whether you decided that the length of a log entry was 80 or 100 characters long, or the precision of a number. That is why a little check like this I’m going to show you is excellent for tidying up your code before you check it back into source Control!
1/ Checking Parameters only
If you were just going to check parameters, you might just do this. It simply groups all the parameters, either input or output, of all the routines (e.g. stored procedures or functions) by their name and checks to see, in the HAVING clause, whether their data types are all the same. If not, it lists all the examples and their origin (the routine)
Even this little check can occasionally be scarily revealing.
;WITH userParameter AS
( SELECT
c.NAME AS ParameterName,
OBJECT_SCHEMA_NAME(c.object_ID) + '.' + OBJECT_NAME(c.object_ID) AS ObjectName,
t.name + ' '
+ CASE
--we may have to put in the length
WHEN t.name IN ('char', 'varchar', 'nchar', 'nvarchar')
THEN '('
+ CASE WHEN c.max_length = -1 THEN 'MAX'
ELSE CONVERT(VARCHAR(4),
CASE WHEN t.name IN ('nchar', 'nvarchar')
THEN c.max_length / 2 ELSE c.max_length
END)
END + ')'
WHEN t.name IN ('decimal', 'numeric')
THEN '(' + CONVERT(VARCHAR(4), c.precision)
+ ',' + CONVERT(VARCHAR(4), c.Scale) + ')'
ELSE ''
END --we've done with putting in the length
+ CASE WHEN XML_collection_ID <> 0
THEN --deal with object schema names
'(' + CASE WHEN is_XML_Document = 1
THEN 'DOCUMENT '
ELSE 'CONTENT '
END
+ COALESCE(
(SELECT QUOTENAME(ss.name) + '.' + QUOTENAME(sc.name)
FROM sys.xml_schema_collections sc
INNER JOIN Sys.Schemas ss ON sc.schema_ID = ss.schema_ID
WHERE sc.xml_collection_ID = c.XML_collection_ID),'NULL') + ')'
ELSE ''
END
AS [DataType]
FROM sys.parameters c
INNER JOIN sys.types t ON c.user_Type_ID = t.user_Type_ID
WHERE OBJECT_SCHEMA_NAME(c.object_ID) <> 'sys'
AND parameter_id>0)
SELECT CONVERT(CHAR(80),objectName+'.'+ParameterName),DataType FROM UserParameter
WHERE ParameterName IN
(SELECT ParameterName FROM UserParameter
GROUP BY ParameterName
HAVING MIN(Datatype)<>MAX(DataType))
ORDER BY ParameterName
so, in a very small example here, we have a @ClosingDelimiter variable that is only CHAR(1) when, by the looks of it, it should be up to ten characters long, or even worse, a function that should be a char(1) and seems to let in a string of ten characters. Worth investigating. Then we have a @Comment variable that can't decide whether it is a VARCHAR(2000) or a VARCHAR(MAX)
2/ Columns and Parameters
Actually, once we’ve cleared up the mess we’ve made of our parameter-naming in the database we’re inspecting, we’re going to be more interested in listing both columns and parameters. We can do this by modifying the routine to list columns as well as parameters. Because of the slight complexity of creating the string version of the datatypes, we will create a fake table of both columns and parameters so that they can both be processed the same way. After all, we want the datatypes to match
Unfortunately, parameters do not expose all the attributes we are interested in, such as whether they are nullable (oh yes, subtle bugs happen if this isn’t consistent for a datatype). We’ll have to leave them out for this check.
Voila! A slight modification of the first routine
;WITH userObject AS( SELECT
Name AS DataName,--the actual name of the parameter or column ('@' removed)
--and the qualified object name of the routine
OBJECT_SCHEMA_NAME(ObjectID) + '.' + OBJECT_NAME(ObjectID) AS ObjectName,
--now the harder bit: the definition of the datatype.
TypeName + ' '
+ CASE
--we may have to put in the length. e.g. CHAR (10)
WHEN TypeName IN ('char', 'varchar', 'nchar', 'nvarchar')
THEN '('
+ CASE WHEN MaxLength = -1 THEN 'MAX'
ELSE CONVERT(VARCHAR(4),
CASE WHEN TypeName IN ('nchar', 'nvarchar')
THEN MaxLength / 2 ELSE MaxLength
END)
END + ')'
WHEN TypeName IN ('decimal', 'numeric')--a BCD number!
THEN '(' + CONVERT(VARCHAR(4), Precision)
+ ',' + CONVERT(VARCHAR(4), Scale) + ')'
ELSE ''
END --we've done with putting in the length
+ CASE WHEN XML_collection_ID <> 0 --tush tush. XML
THEN --deal with object schema names
'(' + CASE WHEN is_XML_Document = 1
THEN 'DOCUMENT '
ELSE 'CONTENT '
END
+ COALESCE(
(SELECT TOP 1 QUOTENAME(ss.name) + '.' + QUOTENAME(sc.Name)
FROM sys.xml_schema_collections sc
INNER JOIN Sys.Schemas ss ON sc.schema_ID = ss.schema_ID
WHERE sc.xml_collection_ID = XML_collection_ID),'NULL') + ')'
ELSE ''
END
AS [DataType],
DataObjectType
FROM
(Select t.name AS TypeName, REPLACE(c.name,'@','') AS Name,
c.max_length AS MaxLength, c.precision AS [Precision],
c.scale AS [Scale], c.[Object_id] AS ObjectID, XML_collection_ID,
is_XML_Document,'P' AS DataobjectType
FROM sys.parameters c
INNER JOIN sys.types t ON c.user_Type_ID = t.user_Type_ID
AND parameter_id>0
UNION all
Select t.name AS TypeName, c.name AS Name, c.max_length AS MaxLength,
c.precision AS [Precision], c.scale AS [Scale],
c.[Object_id] AS ObjectID, XML_collection_ID,is_XML_Document,
'C' AS DataobjectType
FROM sys.columns c
INNER JOIN sys.types t ON c.user_Type_ID = t.user_Type_ID
WHERE OBJECT_SCHEMA_NAME(c.object_ID) <> 'sys'
)f)
SELECT CONVERT(CHAR(80),objectName+'.'
+ CASE WHEN DataobjectType ='P' THEN '@' ELSE '' END + DataName),DataType
FROM UserObject
WHERE DataName IN
(SELECT DataName FROM UserObject
GROUP BY DataName
HAVING MIN(Datatype)<>MAX(DataType))
ORDER BY DataName
Hmm. I can tell you I found quite a few minor issues with the various tabases I tested this on, and found some potential bugs that really leap out at you from the results. Here is the start of the result for AdventureWorks. Yes, AccountNumber is, for some reason, a Varchar(10) in the Customer table. Hmm. odd. Why is a city fifty characters long in that view? The idea of the description of a colour being 256 characters long seems over-ambitious. Go down the list and you'll spot other mistakes. There are no bugs, but just mess.
We started out with a listing to examine parameters, then we mixed parameters and columns. Our last listing is for a slightly more in-depth look at table columns. You’ll notice that we’ve delibarately removed the indication of whether a column is persisted, or is an identity column because that gives us false positives for our code smells. If you just want to browse your metadata for other reasons (and it can quite help in some circumstances) then uncomment them!
;WITH userColumns AS
( SELECT
c.NAME AS columnName,
OBJECT_SCHEMA_NAME(c.object_ID) + '.' + OBJECT_NAME(c.object_ID) AS ObjectName,
REPLACE(t.name + ' '
+ CASE WHEN is_computed = 1 THEN ' AS ' + --do DDL for a computed column
(SELECT definition FROM sys.computed_columns cc
WHERE cc.object_id = c.object_id AND cc.column_ID = c.column_ID)
--we may have to put in the length
WHEN t.Name IN ('char', 'varchar', 'nchar', 'nvarchar')
THEN '('
+ CASE WHEN c.Max_Length = -1 THEN 'MAX'
ELSE CONVERT(VARCHAR(4),
CASE WHEN t.Name IN ('nchar', 'nvarchar')
THEN c.Max_Length / 2 ELSE c.Max_Length
END)
END + ')'
WHEN t.name IN ('decimal', 'numeric')
THEN '(' + CONVERT(VARCHAR(4), c.precision) + ',' + CONVERT(VARCHAR(4), c.Scale) + ')'
ELSE ''
END + CASE WHEN c.is_rowguidcol = 1
THEN ' ROWGUIDCOL'
ELSE ''
END + CASE WHEN XML_collection_ID <> 0
THEN --deal with object schema names
'(' + CASE WHEN is_XML_Document = 1
THEN 'DOCUMENT '
ELSE 'CONTENT '
END + COALESCE((SELECT
QUOTENAME(ss.name) + '.' + QUOTENAME(sc.name)
FROM
sys.xml_schema_collections sc
INNER JOIN Sys.Schemas ss ON sc.schema_ID = ss.schema_ID
WHERE
sc.xml_collection_ID = c.XML_collection_ID),
'NULL') + ')'
ELSE ''
END + CASE WHEN is_identity = 1
THEN CASE WHEN OBJECTPROPERTY(object_id,
'IsUserTable') = 1 AND COLUMNPROPERTY(object_id,
c.name,
'IsIDNotForRepl') = 0 AND OBJECTPROPERTY(object_id,
'IsMSShipped') = 0
THEN ''
ELSE ' NOT FOR REPLICATION '
END
ELSE ''
END + CASE WHEN c.is_nullable = 0
THEN ' NOT NULL'
ELSE ' NULL'
END + CASE
WHEN c.default_object_id <> 0
THEN ' DEFAULT ' + object_Definition(c.default_object_id)
ELSE ''
END + CASE
WHEN c.collation_name IS NULL
THEN ''
WHEN c.collation_name <> (SELECT
collation_name
FROM
sys.databases
WHERE
name = DB_NAME()) COLLATE Latin1_General_CI_AS
THEN COALESCE(' COLLATE ' + c.collation_name,
'')
ELSE ''
END,' ',' ') AS [DataType]
FROM sys.columns c
INNER JOIN sys.types t ON c.user_Type_ID = t.user_Type_ID
WHERE OBJECT_SCHEMA_NAME(c.object_ID) <> 'sys'
)
SELECT CONVERT(CHAR(80),objectName+'.'+columnName),DataType FROM UserColumns
WHERE columnName IN
(SELECT columnName FROM UserColumns
GROUP BY columnName
HAVING MIN(Datatype)<>MAX(DataType))
ORDER BY columnName
If you take a look down the results against Adventureworks, you'll see once again that there are things to investigate, mostly, in the illustration, discrepancies between null and non-null datatypes
So I here you ask, what about temporary variables within routines? If ever there was a source of elusive bugs, you'll find it there. Sadly, these temporary variables are not stored in the metadata so we'll have to find a more subtle way of flushing these out, and that will, I'm afraid, have to wait!
© Simple Talk or respective owner