ZFS & Deduplicating FLAC Data

Posted by jasongullickson on Super User See other posts from Super User or by jasongullickson
Published on 2012-10-15T15:08:49Z Indexed on 2012/10/15 15:40 UTC
Read the original article Hit count: 236

Filed under:

linux

|

debian

|

zfs

|

flac

|

bsd

I'm experimenting with using ZFS to deduplicate a large library of FLAC files. The purpose of this is twofold:

Reduce storage utilization
Reduce bandwidth needed to sync the library with cloud storage

Many of these files are of the same music tracks but from different physical media. This means that for the most part they are the same and usually close to the same size, which makes me think that they should benefit from block-level deduplication.

However in my testing I'm not seeing good results. When I create a pool and add three of these tracks (identical songs from different source media) zpool list reports 1.00 dedupe. If I copy all of the files (make exact duplicates of the three) dedupe climbs, so I know that it is enabled and functioning, but it's not finding any duplication in the original collection of files.

My first thought was that perhaps some of the variable header data (metadata tags, etc.) might be mis-aligning the bulk of the data in these files (the audio frames) but even making the header data consistent across the three files doesn't seem to have any impact on deduplication.

I'm considering taking alternate routes (testing other dedupe filesystems as well as some custom code) but since we're already using ZFS and I like the ZFS replication options, I'd prefer to use ZFS dedupe for this project; but perhaps it's simply not capable of working well with this sort of data.

Any feedback regarding tuning that might improve dedupe performance for this sort of dataset, or confirmation that ZFS dedupe is not the right tool for this job are appreciated.

© Super User or respective owner

Related posts about linux

apt-get install and update fail

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I've got a problem with apt-get update and apt-get install ... commands . every time update or installing fails and errors are : Get:1 http://dl.google.com stable Release.gpg [198B] Ign http://dl.google.com/linux/chrome/deb/ stable/main Translation-en_US Get:2 http://dl… >>> More
kernel module compiling error

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
sh@ubuntu:/home/ccpp/helloworld$ make gcc-4.6 -O2 -DMODULE -D_KERNEL_ -W -Wall -Wstrict-prototypes -Wmissing-prototypes -isystem /lib/modules/`uname -r`/build/include -c -o hello-1.o hello-1.c hello-1.c:4:0: warning: "MODULE" redefined [enabled by default] <command-line>:0:0: note: this is… >>> More
Build-Essentials installation failing

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I am having trouble accessing the several critical header files that show to be a part of the build process. The "Ubuntu Software Center" shows "Build Essentials" as installed: Next I did the following two commands, which did not improve the problem: ~$ sudo apt-get install build-essential [sudo]… >>> More
Updating Debian kernel

as seen on Super User - Search for 'Super User'
I'm trying to update my Debian machine to 2.6.32-46 (which is the new stable). However, after doing apt-get update my apt-cache search linux-image shows me: linux-headers-2.6.32-5-486 - Header files for Linux 2.6.32-5-486 linux-headers-2.6.32-5-686-bigmem - Header files for Linux 2.6.32-5-686-bigmem linux-headers-2… >>> More
Serial connection over a single USB cable (Windows to linux, or linux to linux)

as seen on Server Fault - Search for 'Server Fault'
I'm helping out with a project for an embedded device that only has USB and no serial. This device is running Linux. These days, when we need to connect to a serial port on a device we typically use a USB to serial adapter (on something like a phone system or a load balancing device, etc). I would… >>> More

Related posts about debian

Trying to update debian not working

as seen on Super User - Search for 'Super User'
As root i type this command apt-get update and get these error messages. > Err http://security.debian.org lenny/updates Release.gpg Could not resolve 'security.debian.org' Err http://security.debian.org lenny/updates/main… >>> More
Trouble with dns and debian update

as seen on Server Fault - Search for 'Server Fault'
I tried to update my debian dreamplug server with the command running as root apt-get update and recieved these errors. Err http://security.debian.org lenny/updates Release.gpg Could not resolve 'security.debian… >>> More
Errors when installing Open Office

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I followed the first set of instructions on this page to install Open Office: How to install Open Office? However, the last step which says to change the CHMOD of a folder, I got an error saying that the directory does not exist. Open Office now appears in my Ubuntu start menu, but clicking… >>> More
Installing PHP4 on a Debian (lenny) 7 32bit box

as seen on Server Fault - Search for 'Server Fault'
I am trying to install PHP4 on a Debian 7 32bit box but I ran into the following root@php4:~# apt-get update Get:1 http://snapshot.debian.org lenny Release.gpg [189 B] Hit http://snapshot.debian.org lenny Release Ign http://snapshot.debian.org lenny Release Hit http://snapshot.debian.org lenny/main… >>> More
Debian keyring error: "No keyring installed"

as seen on Server Fault - Search for 'Server Fault'
I have a Debian Squeeze EC2 AMI. On booting up an instance with it and trying to install packages with apt-get I get errors saying there is no keyring installed. Here is the error with apt-get update: root@ip:~# apt-get update Get:1 http://ftp.us.debian.org squeeze Release.gpg [1672 B] Ign http://ftp… >>> More