29

When I run fdupes it finds more than 30,000 duplicate files. I need to keep one file and delete all the other duplicates (because some of them are systemfiles). Please give me a command or script to do this without pressing "1 or 2 or all" for each and every group of duplicate files.

user84055
  • 493

5 Answers5

40

You can do this if you want to run this silently (I've just used it to clear 150 GB of dupes running on rackspace block storage ..£kerching!!)

fdupes -rdN dir/

r - recursive
d - preserver first file, delete other dupes
N - run silently (no prompt)
user288359
  • 401
  • 4
  • 2
  • Ain't there an option that would move to trash instead of deleting dupes? – Augustin Riedinger Mar 21 '16 at 14:04
  • 4
    Did for f in $(fdupes -f .); do gvfs-trash $f; done – Augustin Riedinger Mar 21 '16 at 14:12
  • 5
    @AugustinRiedinger: Even better would be fdupes -f . | xargs -rd '\n' -- gvfs-trash if you have file names with spaces and special characters or many files. – David Foerster Jun 03 '16 at 07:40
  • @DavidFoerster you still may have filenames with newlines, which will break your command. – Ruslan Oct 14 '16 at 11:44
  • @Ruslan: That's correct but at the moment fdupes doesn't provide an option for null-terminated records, so there's no better option. It's definitely much better than for f in $(fdupes ...) though. :-] – David Foerster Oct 14 '16 at 14:38
  • 1
    for f in $(fdupes -f .); do gvfs-trash "$f"; done If you don't use quotes around $f, then file names with spaces will not be removed correctly. – Rucent88 Aug 27 '18 at 08:40
  • Filenames with spaces still failed using quotes; the xargs command did work. I got lots of complaints that it couldn't create the info file because the filename was too long. – Darren Cook Jul 12 '23 at 15:27
  • WARNING!! DO NOT DO THIS. Do not ignore the "filename too long" errors. I am assuming that is why, when I emptied trash it took away the parent directory I was deleting from. I now have 1.2TB free, when I expected to have 8GB free. Off to find a backup. Next time I will use the safer method listed below, that does a mv (though not sure how it will cope with duplicate filenames). Or maybe simply delete immediately, and not mess around with gvfs-trash. – Darren Cook Jul 12 '23 at 15:38
9

fdupes has a rich CLI:

fdupes -r ./stuff > dupes.txt

Then, deleting the duplicates was as easy as checking dupes.txt and deleting the offending directories. fdupes also can prompt you to delete the duplicates as you go along.

fdupes -r /home/user > /home/user/duplicate.txt

Output of the command goes in duplicate.txt.

fdupes will compare the size and MD5 hash of the files to find duplicates.

Check the fdupes manpage for detailed usage info.

Eliah Kagan
  • 117,780
Amol Sale
  • 1,006
4

I would use this safer way:

Create a script and move the duplicated files to a new folder. If you move to a folder outside the original folder, fdupes won't report the duplicated files on a second scan, and it will be safer to delete them.

#!/bin/bash

# Save default separator definitions
oIFS=$IFS
# define new line as a separator, filenames can have spaces
IFS=$'\n';

# For each file (f) listed as duplicated by fdupes, recursively
  for f in `fdupes -r -f .`
  do
    # Log the files I'm moving
    echo "Moving $f to folder Duplicates" >> ~/log.txt
    # Move the duplicated file, keeping the original in the original folder
    mv $f Duplicates/
  done

# restore default separator definitions
IFS=$oIFS
derHugo
  • 3,356
  • 5
  • 31
  • 51
2

I have used fslint and DupeGuru for quite some time.

  • FSlint supports selection by wildcard and other cleanup methods
  • DupeGuru supports regex

Both can handle >10000 files/folders

seb
  • 2,341
0

I have tried them all, diff, fdupes, rsync, rdfind shell scripts and without a doubt fslint beats them all hands down. It shows the duplicates, allows you to examine them and merge or delete. The GUI is very clean and easy to use. I'm using Ubuntu 20.04.

John
  • 1