A format for storing personal contacts in a database
- by Gart
I'm thinking of the best way to store personal contacts in a database for a business application. The traditional and straightforward approach would be to create a table with columns for each element, i.e. Name, Telephone Number, Job title, Address, etc... However, there are known industry standards for this kind of data, like for example vCard, or hCard, or vCard-RDF/XML or even Windows Contacts XML Schema. Utilizing an standard format would offer some benefits, like inter-operablilty with other systems. But how can I decide which method to use?
The requirements are mainly to store the data. Search and ordering queries are highly unlikely but possible. The volume of the data is 100,000 records at maximum.
My database engine supports native XML columns. I have been thinking to use some XML-based format to store the personal contacts. Then it will be possible to utilize XML indexes on this data, if searching and ordering is needed. Is this a good approach? Which contacts format and schema would you recommend for this?
Edited after first answers
Here is why I think the straightforward approach is bad. This is due to the nature of this kind of data - it is not that simple.
The personal contacts it is not well-structured data, it may be called semi-structured. Each contact may have different data fields, maybe even such fields which I cannot anticipate. In my opinion, each piece of this data should be treated as important information, i.e. no piece of data could be discarded just because there was no relevant column in the database.
If we took it further, assuming that no data may be lost, then we could create a big text column named Comment or Description or Other and put there everything which cannot be fitted well into table columns. But then again - the data would lose structure - this might be bad.
If we wanted structured data then - according to the database design principles - the data should be decomposed into entities, and relations should be established between the entities. But this adds complexity - there are just too many entities, and lots of design desicions should be made, like "How do we store Address? Personal Name? Phone number? How do we encode home phone numbers and mobile phone numbers? How about other contact info?.." The relations between entities are complex and multiple, and each relation is a table in the database. Each relation needs to be documented in the design papers. That is a lot of work to do. But it is possible to avoid the complexity entirely - just document that the data is stored according to such and such standard schema, period. Then anybody who would be reading that document should easily understand what it was all about.
Finally, this is all about using an industry standard. The standard is, hopefully, designed by some clever people who anticipated and described the structure of personal contacts information much better than I ever could. Why should we all reinvent the wheel?? It's much easier to use a standard schema. The problem is, there are just too many standards - it's not easy to decide which one to use!