Media Archive System with branches?
Posted
by Ian McEwen
on Super User
See other posts from Super User
or by Ian McEwen
Published on 2010-05-02T22:48:33Z
Indexed on
2010/05/02
22:58 UTC
Read the original article
Hit count: 300
In short, how can I get VCS features (revisioning, branching, and deduplication) for a media collection that's far too large for most/all VCS systems?
Background
I have a 300GB music folder; unfortunately, I only have the hard drive space for this on my desktop system. However, a good portion of my collection is FLAC; therefore, I could theoretically have a space-optimized version in which I transcode all the FLAC to mp3 or some other lossy format, and use only that version on the laptop.
However, a portion of my collection isn't FLAC. And that which isn't FLAC shouldn't be transcoded to an equivalent format; it won't have any space savings, which is the point. Moreover, it shouldn't be duplicated: the mp3/ogg portions of the collection should probably be exactly the same files.
Thoughts
One solution is to have format-specific organization of my music folders, and use some script to transcode the FLAC directory to mp3 or such into another directory. Another is some sort of hack using entirely separate copies and symbolic links for deduplication, or something similar.
But these also have a disadvantage of lacking versioning; I'd like to be able to reorganize my music collection, retag things, etc. and save history. This isn't key, but would be awfully nice.
I can't see it as entirely unreasonable to set up VCS hooks or something equivalent to keep directory structure synced between two copies, update tags, and transcode FLAC automatically into the space-optimized copy.
Basically, the system I really want is a version control system. Two branches: one archival/desktop branch including the FLAC, one space-optimized/laptop branch without it; most VCSes would deal well with whole chunks being the same files well by compressing in a reasonable way (i.e. don't keep two copies of the same data). I could also do a lot of what I talk about above with hooks.
But I don't know of any VCS that would deal with a 300GB repository with almost 20k files. Many of them would just not even initialize the whole affair; others would just do it inexpressibly slowly or otherwise badly. checkpoint looks like it's designed for something close (it's at least for media), but wouldn't do deduplication well (and I'm not convinced I'd be able to script it to do things like automatic transcoding and directory-structure syncing).
So. Is there anything out there that can do all this, or should I consider it a programming project?
© Super User or respective owner