Authoritative sources about Database vs. Flatfile decision
- by FastAl
<tldr>looking for a reference to a book or other undeniably authoritative source that gives reasons when you should choose a database vs. when you should choose other storage methods. I have provided an un-authoritative list of reasons about 2/3 of the way down this post.</tldr>
I have a situation at my company where a database is being used where it would be better to use another solution (in this case, an auto-generated piece of source code that contains a static lookup table, searched by binary sort).
Normally, a database would be an OK solution even though the problem does not require a database, e.g, none of the elements of ACID are needed, as it is read-only data, updated about every 3-5 years (also requiring other sourcecode changes), and fits in memory, and can be keyed into via binary search (a tad faster than db, but speed is not an issue).
The problem is that this code runs on our enterprise server, but is shared with several PC platforms (some disconnected, some use a central DB, etc.), and parts of it are managed by multiple programming units, parts by the DBAs, parts even by mathematicians in another department, etc. These hit their own platform’s version of their databases (containing their own copy of the static data). What happens is that every implementation, every little change, something different goes wrong. There are many other issues as well. I can’t even use a flatfile, because one mode of running on our enterprise server does not have permission to read files (only databases, and of course, its own literal storage, e.g., in-source table).
Of course, other parts of the system use databases in proper, less obscure manners; there is no problem with those parts.
So why don’t we just change it? I don’t have administrative ability to force a change. But I’m affected because sometimes I have to help fix the problems, but mostly because it causes outages and tons of extra IT time by other programmers and d*mmit that makes me mad!
The reason neither management, nor the designers of the system, can see the problem is that they propose a solution that won’t work: increase communication; implement more safeguards and standards; etc. But every time, in a different part of the already-pared-down but still multi-step processes, a few different diligent, hard-working, top performing IT personnel make a unique subtle error that causes it to fail, sometimes after the last round of testing! And in general these are not single-person failures, but understandable miscommunications. And communication at our company is actually better than most. People just don't think that's the case because they haven't dug into the matter.
However, I have it on very good word from somebody with extensive formal study of sociology and psychology that the relatively small amount of less-than-proper database usage in this gigantic cross-platform multi-source, multi-language project is bureaucratically un-maintainable. Impossible. No chance. At least with Human Beings in the loop, and it can’t be automated.
In addition, the management and developers who could change this, though intelligent and capable, don’t understand the rigidity of this ‘how humans are’ issue, and are not convincible on the matter.
The reason putting the static data in sourcecode will solve the problem is, although the solution is less sexy than a database, it would function with no technical drawbacks; and since the sharing of sourcecode already works very well, you basically erase any database-related effort from this section of the project, along with all the drawbacks of it that are causing problems.
OK, that’s the background, for the curious. I won’t be able to convince management that this is an unfixable sociological problem, and that the real solution is coding around these limits of human nature, just as you would code around a bug in a 3rd party component that you can’t change.
So what I have to do is exploit the unsuitableness of the database solution, and not do it using logic, but rather authority. I am aware of many reasons, and posts on this site giving reasons for one over the other; I’m not looking for lists of reasons like these (although you can add a comment if I've miss a doozy):
WHY USE A DATABASE? instead of flatfile/other DB vs. file: if you need...
Random Read / Transparent search optimization
Advanced / varied / customizable Searching and sorting capabilities
Transaction/rollback
Locks, semaphores
Concurrency control / Shared users
Security
1-many/m-m is easier
Easy modification
Scalability
Load Balancing
Random updates / inserts / deletes
Advanced query
Administrative control of design, etc.
SQL / learning curve
Debugging / Logging
Centralized / Live Backup capabilities
Cached queries / dvlp & cache execution plans
Interleaved update/read
Referential integrity, avoid redundant/missing/corrupt/out-of-sync data
Reporting (from on olap or oltp db) / turnkey generation tools
[Disadvantages:]
Important to get right the first time - professional design - but only b/c it's meant to last
s/w & h/w cost
Usu. over a network, speed issue (best vs. best design vs. local=even then a separate process req's marshalling/netwk layers/inter-p comm)
indicies and query processing can stand in the way of simple processing (vs. flatfile)
WHY USE FLATFILE: If you only need...
Sequential Row processing only
Limited usage append only (no reading, no master key/update)
Only Update the record you're reading (fixed length recs only)
Too big to fit into memory
If Local disk / read-ahead network connection
Portability / small system
Email / cut & Paste / store as document by novice - simple format
Low design learning curve but high cost later
WHY USE IN-MEMORY/TABLE (tables, arrays, etc.): if you need...
Processing a single db/ff record that was imported
Known size of data
Static data if hardcoding the table
Narrow, unchanging use (e.g., one program or proc)
-includes a class that will be shared, but encapsulates its data manipulation
Extreme speed needed / high transaction frequency
Random access - but search is dependent on implementation
Following are some other posts about the topic:
http://stackoverflow.com/questions/1499239/database-vs-flat-text-file-what-are-some-technical-reasons-for-choosing-one-over
http://stackoverflow.com/questions/332825/are-flat-file-databases-any-good
http://stackoverflow.com/questions/2356851/database-vs-flat-files
http://stackoverflow.com/questions/514455/databases-vs-plain-text/514530
What I’d like to know is if anybody could recommend a hard, authoritative source containing these reasons. I’m looking for a paper book I can buy, or a reputable website with whitepapers about the issue (e.g., Microsoft, IBM), not counting the user-generated content on those sites. This will have a greater change to elicit a change that I’m looking for: less wasted programmer time, and more reliable programs.
Thanks very much for your help. You win a prize for reading such a large post!