I'm having doubts about my table structure, and I wonder if there is a better approach.
I've got a little database for version control repositories (e.g. SVN), the packages (e.g. Linux RPMs) built therefrom, and the versions (e.g. 1.2.3-4) thereof. A given repository might produce no packages, or several, but if there are more than one for a given repository then a particular version for that repository will indicate a single "tag" of the codebase.
A particular version "string" might be used to tag a version of the source code in more than one repository, but there may be no relationship between "1.0" for two different repos. So if packages P and Q both come from repo R, then P 1.0 and Q 1.0 are both built from the 1.0 tag of repo R. But if package X comes from repo Y, then X 1.0 has no relationship to P 1.0.
In my (simplified) model, I have the following tables (the x_id columns are auto-incrementing surrogate keys; you can pretend I'm using a different primary key if you wish, it's not really important):
repository
- repository_id
- repository_name (unique)
...
version
- version_id
- version_string (unique for a particular repository)
- repository_id
...
package
- package_id
- package_name (unique)
- repository_id
...
This makes it easy for me to see, for example, what are valid versions of a given package: I can join with the version table using the repository_id. However, suppose I would like to add some information to this database, e.g., to indicate which package versions have been approved for release. I certainly need a new table:
package_version
- version_id
- package_id
- package_version_released
...
Again, the nature of the keys that I use are not really important to my problem, and you can imagine that the data column is "promotion_level" or something if that helps.
My doubts arise when I realize that there's really a very close relationship between the version_id and the package_id in my new table ... they must share the same repository_id. Only a small subset of package/version combinations are valid. So I should have some kind of constraint on those columns, enforcing that ...
... I don't know, it just feels off, somehow. Like I'm including somehow more information than I really need? I don't know how to explain my hesitance here. I can't figure out which (if any) normal form I'm violating, but I also can't find an example of a schema with this sort of structure ... not being a DBA by profession I'm not sure where to look.
So I'm asking: am I just being overly sensitive?