I have just been assigned to renovate an old website, and I get to move it from some old archaic system to drupal. The only problem is that it's a real-estate system and a lot of data is stored. Currently all the information is stored in a single table, an id represents the house and then everything else is key/value pairs. There are a possible 243 keys per estate, there are 23840 estates in the system. As you can imagine the system is slow and difficult to query.
I don't think a table with 243 rows would be a very good idea, and probably worse than the current situation. I've done some investigating and here's what I've found out:
Missing data does not indicate a 0 value, data is merged from two, unique sources/formats. Some guessing is involved. I have no control over the source of the data.
There are 4 keys that are common to
all estates, all values look like
something that is commonly searched
for and could be indexed
There are 10 keys that are in the [90-100)% range
8 of these are information like who's selling it, and it's address.
The other two seem to belong with the below range
There are 80 keys that are in the [80-90)% range
This range seems to mostly just list room types and how many the house has (e.g. bedrooms_possible, bathrooms, family_room_3rd, etc)
This range also includes some minor information like school districts, one or two more pieces of data on the address.
The 179 keys that are in the [0-80)% range include all sorts of miscellaneous information about the estate
My best idea was a hybrid approach, create a table that stores important, common information and keep a smaller key/value table.
How would you store this information?