Estimating compressed file size using a list parameter
Posted
by
Sai
on Super User
See other posts from Super User
or by Sai
Published on 2012-03-19T20:06:00Z
Indexed on
2012/03/19
23:33 UTC
Read the original article
Hit count: 331
I am currently compressing a list of files from a directory in the following format:
tar -cvjf test_1.tar.gz -T test_1.lst --no-recursion
The above command will compress only those files mentioned in the list. I am doing this because this list is generated such that it fits a DVD. However, during compression the compression rate decreases the estimated file size and there is abundant space left in the DVD. This is something like a Knapsack algorithm.
I would like to estimate the compressed file size and add some more files to the list. I found that it is possible to estimate file size using the following command:
tar -cjf - Folder/ | wc -c
This command does not take a list parameter. Is there a way to estimate compressed file size? I am also looking into options like perl scripts etc.
Edit:
I think I should provide more information since I have been doing a lot of web search. I came across a perl script(Link)that sort of emulates the Knapsack algorithm.
The current problem with the above mentioned script is that it splits the files in their original state. When I compress the files after splitting them, there are opportunities for adding more files which I consider to be inefficient.
There are 2 ways I could resolve the inefficiency:
a) Compress individual files and save them in a directory using a script. The compressed file could provide a best estimate. I could generate a script using a folder of compressed files and use them on the uncompressed ones.
b) Check whether the compressed file's size is less than the required size. If so, I should keep adding files until I meet the requirement. However, the addition of new files to the compressed file is an optimization problem by itself.
© Super User or respective owner