28

Possible Duplicate:
How to find (and delete) duplicate files

Is there a reliable duplicate file/folders utility, (with GUI) for Linux that can find duplicate files or folders and move them to different folder?

2 Answers2

37

fdupes

No GUI but fdupes Install fdupes / sudo apt-get install fdupes is very fast and reliable. It uses sizes and modification dates for a preliminary analysis, then compares md5 hashes of the files and then does a bit compare if necessary. It's also dead easy to use. I strongly recommend it.

Typical usage:

fdupes -d -r /path/to/directory/

-r for walking subdirectories as opposed to walking just the contents of the specified dir.

-d to prompt the user about which file to delete (without this fdupes just compiles a list of duplicated)

-N deletes without prompt

-H normally, when two or more files point to the same disk area they are treated as non-duplicates; this option will change this behavior

-L hardlink duplicate files to the first file in each set of duplicates without prompting the user (this option was rolled back in some versions as it was found to be buggy and unsafe in rare cases. It might be reintroduced in future versions).

Edit: The hardlink options was removed as buggy for now. It might return some day. For now you have to use hardlink Install hardlink / sudo apt-get install hardlink

fslint

If you insist on a graphical user interface you might want to have a look at fslint Install fslint / sudo apt-get install fslint (see website for description). It is more feature rich but also more complicated and less reliable.

con-f-use
  • 18,813
  • 1
    How can you use fdupes to find duplicate folders (instead of duplicate files)? – Anderson Green Sep 01 '12 at 22:47
  • Sorry, didn't see your question in the notifications. What exactly do you want to achieve? Find folders with duplicated content or with the same folder name? – con-f-use Oct 15 '12 at 10:52
  • 1
    I want to find folders with duplicated content (I think that's what the OP is referring to). – Anderson Green Oct 15 '12 at 16:11
  • I don't think there's a program that does this. – con-f-use Oct 16 '12 at 12:03
  • That is a bit more tricky and requires you to know your bash scripting. My approach would be: 1. Run fdupes -r folder1 fodlder2 ... on all folders that might contain duplicate directories and save its output. Use find folder1 -type d etc. to find all directories in said folders and store them in array. Use find $array[i] -type f in a for loop over i iterating over the directories to get an individual file list for each directory. Then compare the acquired file lists with the saved output from fdups and see if it only consists of duplicate files. I don't think a program exists for that. – con-f-use Oct 16 '12 at 12:10
  • You can also compare the checksums of two different programs to see if they are duplicates - see here : http://stackoverflow.com/questions/1657232/how-can-i-calculate-an-md5-checksum-of-a-directory – Anderson Green Dec 03 '12 at 17:43
  • Also, would it be possible to determine whether two folders are duplicates simply by converting both of those folders to to .zip or .tar files, and then comparing those zipped files? – Anderson Green Dec 03 '12 at 17:46
  • Actually, there is a shell script that can find duplicate folders: http://unix.stackexchange.com/questions/58340/find-all-folders-in-a-directory-with-the-same-content/58343#comment80430_58343 – Anderson Green Dec 13 '12 at 00:19
2

According to the fdupes --help command, -H does not create hardlinks:

 -H --hardlinks         normally, when two or more files point to the same
                                disk area they are treated as non-duplicates; this
                                option will change this behavior

Instead, -L seems to do this:

 -L --linkhard          hardlink duplicate files to the first file in
                              each set of duplicates without prompting the user
Octavian Helm
  • 14,355
234823
  • 29
  • 2
    You're right, but this information would be better added as a comment on the previous answer. – poolie Feb 01 '12 at 01:23
  • The option to create hardlinks of duplicate files was rolled back after it was found to be buggy and unsafe. It might be reintroduced in future versions. – con-f-use Dec 04 '12 at 13:38