SQL Design Question regarding schema and if Name value pair is the best solution
- by Aur
I am having a small problem trying to decide on database schema for a current project. I am by no means a DBA.
The application parses through a file based on user input and enters that data in the database. The number of fields that can be parsed is between 1 and 42 at the current moment.
The current design of the database is entirely flat with there being 42 columns; some have repeated columns such as address1, address2, address3, etc...
This says that I should normalize the data. However, data integrity is not needed at this moment and the way the data is shaped I'm looking at several joins. Not a bad thing but the data is still in a 1 to 1 relationship and I still see a lot of empty fields per row.
So my concerns are that this does not allow the database or the application to be very extendable. If they want to add more fields to be parsed (which they do) than I'd need to create another table and add another foreign key to the linking table.
The third option is I have a table where the fields are defined and a table for each record. So what I was thinking is to make a table that stores the value and then links to those two tables. The problem is I can picture the size of that table growing large depending on the input size. If someone gives me a file with 300,000 records than 300,000 x 40 = 12 million so I have some reservations. However I think if I get to that point than I should be happy it is being used. This option also allows for more custom displaying of information albeit a bit more work but little rework even if you add more fields.
So the problem boils down to:
1. Current design is a flat file which makes extending it hard and it is not normalized.
2. Normalize the tables although no real benefits for the moment but requirements change.
3. Normalize it down into the name value pair and hope size doesn't hurt.
There are a large number of inserts, updates, and selects against that table. So performance is a worry but I believe the saying is design now, performance testing later?
I'm probably just missing something practical so any comments would be appreciated even if it’s a quick sanity check.
Thank you for your time.