Estimating compressed file size using a list parameter

Posted by Sai on Super User See other posts from Super User or by Sai
Published on 2012-03-19T20:06:00Z Indexed on 2012/03/19 23:33 UTC
Read the original article Hit count: 331

Filed under:
|
|

I am currently compressing a list of files from a directory in the following format:

tar -cvjf test_1.tar.gz -T test_1.lst --no-recursion

The above command will compress only those files mentioned in the list. I am doing this because this list is generated such that it fits a DVD. However, during compression the compression rate decreases the estimated file size and there is abundant space left in the DVD. This is something like a Knapsack algorithm.

I would like to estimate the compressed file size and add some more files to the list. I found that it is possible to estimate file size using the following command:

tar -cjf - Folder/ | wc -c

This command does not take a list parameter. Is there a way to estimate compressed file size? I am also looking into options like perl scripts etc.

Edit:

I think I should provide more information since I have been doing a lot of web search. I came across a perl script(Link)that sort of emulates the Knapsack algorithm.

The current problem with the above mentioned script is that it splits the files in their original state. When I compress the files after splitting them, there are opportunities for adding more files which I consider to be inefficient.

There are 2 ways I could resolve the inefficiency:

a) Compress individual files and save them in a directory using a script. The compressed file could provide a best estimate. I could generate a script using a folder of compressed files and use them on the uncompressed ones.

b) Check whether the compressed file's size is less than the required size. If so, I should keep adding files until I meet the requirement. However, the addition of new files to the compressed file is an optimization problem by itself.

© Super User or respective owner

Related posts about linux

Related posts about tar