DB Design Pattern - Many to many classification / categorised tagging.
- by Robin Day
I have an existing database design that stores Job Vacancies.
The "Vacancy" table has a number of fixed fields across all clients, such as "Title", "Description", "Salary range".
There is an EAV design for "Custom" fields that the Clients can setup themselves, such as, "Manager Name", "Working Hours". The field names are stored in a "ClientText" table and the data stored in a "VacancyClientText" table with VacancyId, ClientTextId and Value.
Lastly there is a many to many EAV design for custom tagging / categorising the vacancies with things such as Locations/Offices the vacancy is in, a list of skills required. This is stored as a "ClientCategory" table listing the types of tag, "Locations, Skills", a "ClientCategoryItem" table listing the valid values for each Category, e.g., "London,Paris,New York,Rome", "C#,VB,PHP,Python". Finally there is a "VacancyClientCategoryItem" table with VacancyId and ClientCategoryItemId for each of the selected items for the vacancy.
There are no limits to the number of custom fields or custom categories that the client can add.
I am now designing a new system that is very similar to the existing system, however, I have the ability to restrict the number of custom fields a Client can have and it's being built from scratch so I have no legacy issues to deal with.
For the Custom Fields my solution is simple, I have 5 additional columns on the Vacancy Table called CustomField1-5. This removes one of the EAV designs.
It is with the tagging / categorising design that I am struggling. If I limit a client to having 5 categories / types of tag. Should I create 5 tables listing the possible values "CustomCategoryItems1-5" and then an additional 5 many to many tables "VacancyCustomCategoryItem1-5"
This would result in 10 tables performing the same storage as the three tables in the existing system.
Also, should (heaven forbid) the requirements change in that I need 6 custom categories rather than 5 then this will result in a lot of code change.
Therefore, can anyone suggest any DB Design Patterns that would be more suitable to storing such data. I'm happy to stick with the EAV approach, however, the existing system has come across all the usual performance issues and complex queries associated with such a design.
Any advice / suggestions are much appreciated.
The DBMS system used is SQL Server 2005, however, 2008 is an option if required for any particular pattern.