structured vs. unstructured data in db
- by Igor
the question is one of design. i'm gathering a big chunk of performance data with lots of key-value pairs. pretty much everything in /proc/cpuinfo, /proc/meminfo/, /proc/loadavg, plus a bunch of other stuff, from several hundred hosts. right now, i just need to display the latest chunk of data in my UI. i will probably end up doing some analysis of the data gathered to figure out performance problems down the road, but this is a new application so i'm not sure what exactly i'm looking for performance-wise just yet.
i could structure the data in the db -- have a column for each key i'm gathering. the table would end up being O(100) columns wide, it would be a pain to put into the db, i would have to add new columns if i start gathering a new stat. but it would be easy to sort/analyze the data just using SQL.
or i could just dump my unstructured data blob into the table. maybe three columns -- host id, timestamp, and a serialized version of my array, probably using JSON in a TEXT field.
which should I do? am i going to be sorry if i go with the unstructured approach? when doing analysis, should i just convert the fields i'm interested in and create a new, more structured table? what are the trade-offs i'm missing here?