SQL: empty string vs NULL value
- by Jacek Prucia
I know this subject is a bit controversial and there are a lot of various articles/opinions floating around the internet. Unfortunatelly, most of them assume the person doesn't know what the difference between NULL and empty string is. So they tell stories about surprising results with joins/aggregates and generally do a bit more advanced SQL lessons. By doing this, they absolutely miss the whole point and are therefore useless for me. So hopefully this question and all answers will move subject a bit forward.
Let's suppose I have a table with personal information (name, birth, etc) where one of the columns is an email address with varchar type. We assume that for some reason some people might not want to provide an email address. When inserting such data (without email) into the table, there are two available choices: set cell to NULL or set it to empty string (''). Let's assume that I'm aware of all the technical implications of choosing one solution over another and I can create correct SQL queries for either scenario. The problem is even when both values differ on the technical level, they are exactly the same on logical level. After looking at NULL and '' I came to a single conclusion: I don't know email address of the guy. Also no matter how hard i tried, I was not able to sent an e-mail using either NULL or empty string, so apparently most SMTP servers out there agree with my logic. So i tend to use NULL where i don't know the value and consider empty string a bad thing.
After some intense discussions with colleagues i came with two questions:
am I right in assuming that using empty string for an unknown value is causing a database to "lie" about the facts? To be more precise: using SQL's idea of what is value and what is not, I might come to conclusion: we have e-mail address, just by finding out it is not null. But then later on, when trying to send e-mail I'll come to contradictory conclusion: no, we don't have e-mail address, that @!#$ Database must have been lying!
Is there any logical scenario in which an empty string '' could be such a good carrier of important information (besides value and no value), which would be troublesome/inefficient to store by any other way (like additional column). I've seen many posts claiming that sometimes it's good to use empty string along with real values and NULLs, but so far haven't seen a scenario that would be logical (in terms of SQL/DB design).
P.S. Some people will be tempted to answer, that it is just a matter of personal taste. I don't agree. To me it is a design decision with important consequences. So i'd like to see answers where opion about this is backed by some logical and/or technical reasons.