I'm working on a website that will be based on user contributed data,
submitted using a regular HTML form.
To simplify my question, let's say that there will be two fields in
the form: "User Name" and "Country" (this is just an example, not the
actual site).
There will be two tables in the database : "countries" and "users,"
with "users.country_id" being a foreign key to the "countries" table
(one-to-many).
The initial database will be empty. Users from all over the world will
submit their names and the countries they live in and eventually the
"countries" table will get filled out with all of the country names in
the world.
Since one country can have several alternative names, input like
Chile, Chili, Chilli will generate 3 different records in the
countries table, but in fact there is only one country.
When I search for records from Chile, Chili and Chilli will not be included.
So my question is - what would be the best way to deal with a
situation like this, with conditions such that the initial database is
empty, no other resources are available and everything is based on
user input?
How can I organize it in such way that Chile, Chili and Chilli would
be treated as one country, with minimum manual interference.
What are the best practices when it comes to normalizing user
submitted data and is there a scientific term for this? I'm sure this
is a common problem.
Again, I used country names just to simplify my question, it can be
anything that has possible different spellings.