I have installed Lubuntu 15.04 on my EEE PC 701 with the file-system BTRFS. I want to deduplicate my data, but I don't know how.
2 Answers
Start by making a full backup so that if something goes wrong you haven't lost anything.
I believe you are looking for duperemove -d
"Duperemove is a simple tool for finding duplicated extents and submitting them for deduplication. When given a list of files it will hash their contents on a block by block basis and compare those hashes to each other, finding and categorizing extents that match each other. When given the -d option, duperemove will submit those extents for deduplication using the btrfs-extent-same ioctl.
Duperemove has two major modes of operation one of which is a subset of the other.
Readonly / Non-deduplicating Mode
When run without -d (the default) duperemove will print out one or more tables of matching extents it has determined would be ideal candidates for deduplication. As a result, readonly mode is useful for seeing what duperemove might do when run with '-d'. The output could also be used by some other software to submit the extents for deduplication at a later time.
It is important to note that this mode will not print out all instances of matching extents, just those it would consider for deduplication.
Generally, duperemove does not concern itself with the underlying representation of the extents it processes. Some of them could be compressed, undergoing I/O, or even have already been deduplicated. In dedupe mode, the kernel handles those details and therefore we try not to replicate that work.
Deduping Mode
This functions similarly to readonly mode with the exception that the duplicated extents found in our "read, hash, and compare" step will actually be submitted for deduplication. An estimate of the total data deduplicated will be printed after the operation is complete. This estimate is calculated by comparing the total amount of shared bytes in each file before and after the dedupe.
See the duperemove man page for further details about running duperemove."
this doesn't seem to appear in the btrfs-tools package but there is a git hub page for it here. Recent open and closed issues (aka pulse) available here.
Packages for All currently supported versiosn of Ubuntu can be found in this PPA
I must re-iterate that backing up is highly recommended. See: https://github.com/markfasheh/duperemove/issues/50
Quoted Source: https://github.com/markfasheh/duperemove
man page: https://manpages.debian.org/testing/duperemove/duperemove.8.en.html

- 36,023
- 25
- 98
- 183
I always used bedup. It is very fast and reliable. This tool is also mentioned on official btrfs page. I've never used duperemove (bedup is older).

- 4,403
- 9
- 40
- 65
-
3At this point bedup is no longer in active development and is woefully out of date. – Perkins May 04 '16 at 23:47
-
3@Perkins I disagree. I just received feedback for the Issue https://github.com/g2p/bedup/issues/75 I posted today and with this help I just "beduped out" over 7GB of space on my new 16.04 server. – Adam Ryczkowski May 05 '16 at 09:33
-
3Maybe someone's picked it up again then. I'd given up on it after a year or so of it being completely unable to even properly scan for duplicates. That said, unless they've updated it to use the new ioctl, duperemove will be safer as it does the deduplication atomically in kernelspace instead of nuking one of the duplicates and making a reflink copy of the other. But then, bedup will actually get files small enough to be stored in-tree, which duperemove currently can't due to lack of kernel support for it. – Perkins May 05 '16 at 19:26
-
-
1Today is 2021-12-13. The
bedup
github repository seems abandoned; across all published branches, the most recent commit was in 2016. The project link has was removed from the official brtfs page 2020-06-18. – Joel Purra Dec 13 '21 at 12:22
? I believe you need to designate the files to run it on. as in
duperemove [options] files...` – Elder Geek Jun 04 '15 at 13:20duperemove -rdh path1 path2 pathn
, where-r
for recursive,-d
to actually deduplicate, and-h
for human-readable numbers. – Hi-Angel Mar 26 '18 at 06:47Reading state information... Done E: Unable to locate package duperemove – Noah Jul 01 '21 at 14:27