Advice on designing a robust program to handle a large library of meta-information & programs
- by Sam Bryant
So this might be overly vague, but here it is anyway
I'm not really looking for a specific answer, but rather general design principles or direction towards resources that deal with problems like this. It's one of my first large-scale applications, and I would like to do it right.
Brief Explanation
My basic problem is that I have to write an application that handles a large library of meta-data, can easily modify the meta-data on-the-fly, is robust with respect to crashing, and is very efficient. (Sorta like the design parameters of iTunes, although sometimes iTunes performs more poorly than I would like).
If you don't want to read the details, you can skip the rest
Long Explanation
Specifically I am writing a program that creates a library of image files and meta-data about these files. There is a list of tags that may or may not apply to each image.
The program needs to be able to add new images, new tags, assign tags to images, and detect duplicate images, all while operating.
The program contains an image Viewer which has tagging operations. The idea is that if a given image A is viewed while the library has tags T1, T2, and T3, then that image will have boolean flags for each of those tags (depending on whether the user tagged that image while it was open in the Viewer). However, prior to being viewed in the Viewer, image A would have no value for tags T1, T2, and T3. Instead it would have a "dirty" flag indicating that it is unknown whether or not A has these tags or not. The program can introduce new tags at any time (which would automatically set all images to "dirty" with respect to this new tag)
This program must be fast. It must be easily able to pull up a list of images with or without a certain tag as well as images which are "dirty" with respect to a tag.
It has to be crash-safe, in that if it suddenly crashes, all of the tagging information done in that session is not lost (though perhaps it's okay to loose some of it)
Finally, it has to work with a lot of images (10,000)
I am a fairly experienced programmer, but I have never tried to write a program with such demanding needs and I have never worked with databases.
With respect to the meta-data storage, there seem to be a few design choices:
Choice 1: Invidual meta-data vs centralized meta-data
Individual Meta-Data: have a separate meta-data file for each image. This way, as soon as you change the meta-data for an image, it can be written to the hard disk, without having to rewrite the information for all of the other images.
Centralized Meta-Data: Have a single file to hold the meta-data for every file. This would probably require meta-data writes in intervals as opposed to after every change. The benefit here is that you could keep a centralized list of all images with a given tag, ect, making the task of pulling up all images with a given tag very efficient