extensible database design: automatic ALTER TABLE or serialize() field BLOB ?
- by mario
I want an adaptable database scheme. But still use a simple table data gateway in my application, where I just pass an $data[] array for storing.
The basic columns are settled in the initial table scheme. There will however arise a couple of meta fields later (ca 10-20). I want some flexibility there and not adapt the database manually each time, or -worse- change the application logic just because of new fields.
So now there are two options which seem workable yet not overkill. But I'm not sure about the scalability or database drawbacks.
(1) Automatic ALTER TABLE. Whenever the $data array is to be saved, the keys are compared against the current database columns. New columns get defined before the $data is INSERTed into the table. Actually seems simple enough in test code:
function save($data, $table="forum") {
// columns
if ($new_fields = array_diff(array_keys($data), known_fields($table))) {
extend_schema($table, $new_fields, $data);
}
// save
$columns = implode("`, `", array_keys($data));
$qm = str_repeat(",?", count(array_keys($data)) - 1);
echo ("INSERT INTO `$table` (`$columns`) VALUES (?$qm);");
function known_fields($table) {
return unserialize(@file_get_contents("db:$table")) ?: array("id");
function extend_schema($table, $new_fields, $data) {
foreach ($new_fields as $field) {
echo("ALTER TABLE `$table` ADD COLUMN `$field` VARCHAR;");
Since it is mostly meta information fields, adding them just as VARCHAR seems sufficient. Nobody will query by them anyway. So the database is really just used as storage here.
However, while I might want to add a lot of new $data fields on the go, they will not always be populated.
(2) serialize() fields into BLOB. Any new/extraneous meta fields could be opaque to the database. Simply sorting out the virtual fields from the real database columns is simple. And the meta fields can just be serialize()d into a blob/text field then:
function ext_save($data, $table="forum") {
$db_fields = array("id", "content", "flags", "ext");
// disjoin
foreach (array_diff(array_keys($data),$db_fields) as $key) {
$data["ext"][$key] = $data[$key];
unset($data[$key]);
}
$data["ext"] = serialize($data["ext"]);
Unserializing and unpacking this 'ext' column on read queries is a minor overhead. The advantage is that there won't be any sparsely filled columns in the database, so I guess it's compacter and faster than the AUTO ALTER TABLE approach.
Of course, this method prevents ever using one of the new fields in a WHERE or GROUP BY clause. But I think none of the possible meta fields (user_agent, author_ip, author_img, votes, hits, last_modified, ..) would/should ever be used there anyway.
So I currently prefer the 'ext' blob approach, even if it's a one-way ticket.
How are such columns called usually? (looking for examples/doc)
Would you use XML serialization for (very theoretical) in-database queries?
The adapting table scheme seems a "cleaner" interface, even if most columns might remain empty then. How does that impact speed? How many such sparse VARCHAR fields can MySQL/innodb stomach?
But most importantly: Is there any standard implementation for this? A pseudo ORM with automatic ALTER TABLE tricks? Storing a simple column list seems workable, but something like pdo::getColumnMeta would be more robust.