Block-level deduplicating filesystem
Posted
by
James Haigh
on Ask Ubuntu
See other posts from Ask Ubuntu
or by James Haigh
Published on 2012-06-10T20:51:37Z
Indexed on
2012/06/10
22:47 UTC
Read the original article
Hit count: 266
I'm looking for a deduplicating copy-on-write filesystem solution for general user data such as /home
and backups of it. It should use online/inline/synchronous deduplication at the block-level using secure hashing (for negligible chance of collisions) such as SHA256 or TTH. Duplicate blocks need not even touch the disk.
The idea is that I should be able to just copy /home/<user>
to an external HDD with the same such filesystem to do a backup. Simple. No messing around with incremental backups where corruption to any of the snapshots will nearly always break all later snapshots, and no need to use a specific tool to delete or 'checkout' a snapshot. Everything should simply be done from the file browser without worry. Can you imagine how easy this would be? I'd never have to think twice about backing-up again!
I don't mind a performance hit, reliability is the main concern. Although, with specific implementations of cp
, mv
and scp
, and a file browser plugin, these operations would be very fast, especially when there is a lot of duplication as they would only need to transfer the absent blocks. Accidentally using conventional copy tools that do not integrate with the FS would merely take longer, waste some bandwidth when copying remotely and waste some CPU, as the duplicate data would be re-read, re-transferred and re-hashed (although nothing would be re-written), but would absolutely not corrupt anything. (Some filesharing software may also be able to benefit by integrating with the FS.)
So what's the best way of doing this?
I've looked at some options:
- lessfs - Looks unmaintained. Any good?
- [Opendedup/SDFS][3] - Java? Could I use this on Android?! What does [SDFS][4] stand for?
- [Btrfs][5] - Some patches floating around on mailing list archives, but no real support.
- [ZFS][6] - Hopefully they'll one day relicense under a true Free/Opensource GPL-compatible licence.
Also, 2 years ago I had a go at an attempt in Python using Fuse at the file-level to be used over the top of a typical solid FS such as EXT4, but I found Fuse for Python underdocumented and didn't manage to implement all of the system calls.
My first post here, so I can't post more than 2 links until I get over 10 rep:
[3]: http://www.opendedup.org/
[4]: https://en.wikipedia.org/w/index.php?title=SDFS&action=edit&redlink=1
[5]: https://en.wikipedia.org/wiki/Btrfs#Features
[6]: https://en.wikipedia.org/wiki/ZFS#Linux
© Ask Ubuntu or respective owner