Normalisation and 'Anima notitia copia' (Soul of the Database)
- by Phil Factor
(A Guest Editorial for Simple-Talk)
The other day, I was staring at the sys.syslanguages table in SQL Server with
slightly-raised eyebrows .
I’d just been reading Chris Date’s interesting book ‘SQL and Relational Theory’. He’d
made the point that you’re not necessarily doing relational database operations by using a SQL Database product.
The same general point was recently made by Dino Esposito about ASP.NET MVC. The use of ASP.NET MVC doesn’t
guarantee you a good application design: It merely makes it possible to test it. The way I’d describe the sentiment in
both cases is ‘you can hit someone over the head with a frying-pan but you can’t call it cooking’.
SQL enables you to create relational databases. However, even if it smells bad, it is no
crime to do hideously un-relational things with a SQL Database just so long as it’s necessary and you can tell the
difference; not only that but also only if you’re aware of the risks and implications. Naturally, I’ve never knowingly
created a database that Codd would have frowned at, but around the edges are interfaces and data feeds I’ve written
that have caused hissy fits amongst the Normalisation fundamentalists. Part of the problem for those who agonise about
such things is the misinterpretation of Atomicity. An atomic value is one for which, in the strange virtual
universe you are creating in your database, you don’t have any interest in any of its component parts. If you
aren’t interested in the electrons, neutrinos, muons, or taus, then an atom is ..er.. atomic. In
the same way, if you are passed a JSON string or XML, and required to store it in a database, then all you need to do is
to ask yourself, in your role as Anima notitia copia
(Soul of the database) ‘have I any interest in the contents of this item of information?’. If the
answer is ‘No!’, or ‘nequequam! Then it is an atomic value, however complex it may be. After all, you would never
have the urge to store the pixels of images individually, under the misguided idea that these are the atomic values
would you? I would, of course, ask the ‘Anima
notitia copia’ rather than the application developers, since there may be more than one application, and the
applications developers may be designing the application in the absence of full domain knowledge, (‘or by the seat of
the pants’ as the technical term used to be). If, on the other hand, the answer is ‘sure, and we want to index the XML
column’, then we may be in for some heavy XML-shredding sessions to get to store the ‘atomic’ values and ensure future
harmony as the application develops.
I went back to looking at the sys.syslanguages table. It has a months column with the
months in a delimited list
January,February,March,April,May,June,July,August,September,October,November,December
This is an ordered list. Wicked? I seem to remember that this value, like shortmonths and
days, is treated as a ‘thing’. It is merely passed off to an external C++ routine in order to format a date in
a particular language, and never accessed directly within the database. As far as the database is concerned, it is an
atomic value. There is more to normalisation than meets the eye.