The title says it all. How can I detect duplicates in my media library?
7 Answers
dupeGuru Music Edition is what you want. Set the scan type to "Audio Contents" in Preferences. Please note that the program is fairware so please contribute if you can.
I suggest you couple this with MusicBrainz Picard which can tag your music files automatically.

- 15,894
There is a plugin that was made some time ago for this. I've used it recently but it still leaves a little to be desired. There is a "PPA" for it - but no built packages yet, just the Bazaar branch. The install instructions go something like this:
wget http://scrawl.bplaced.net/duplicate-source.tar.gz -O tmp.tar.gz && mkdir -vp ~/.gnome2/rhythmbox/plugins/duplicate-source/ && tar -xf tmp.tar.gz -C ~/.gnome2/rhythmbox/plugins && rm -v tmp.tar.gz
If you're interested in using the Bazaar'd source code do the following instead:
mkdir -vp ~/.gnome2/rhythmbox/plugins && cd ~/.gnome2/rhythmbox/plugins && bzr branch lp:rb-duplicate-source duplicate-source
Once it's installed restart Rhythmbox and you should have a Duplicates Finder now in the plugin list.
After activating it - there are additional configuration options available.
After the plugin is enabled - and when it finds duplicates - it'll add an additional option to your library list:
A few settings that I've found as "odd" - I've tried this on a media library with over 120,000 songs (over 1,000 duplicates) and a library with about 1,000 songs and maybe 30 duplicates. On the former it took a VERY long time and crashed Rhythmbox several times during the search. I eventually went with Automatically "Remove from Library" to avoid having to rebuild the list. On smaller libraries everything works great though.
When a duplicate is found - if you have the default options selected - the lower quality version of the song will be added to the list. So it's safe to select all songs on the Duplicates list and "Remove" (Either delete from disk or remove from library).

- 48,101
-
Thanks a lot for giving this tip! However, nothing seems to happen once I've activated it. I can not find a new tab or whatever where the duplicates would be indicated. – Ingo Sep 18 '10 at 18:27
-
You may need to move the threshold to 0.5 to help sort out duplicates. It'll show as an additional item in your Library list. I've updated the answer to show this. – Marco Ceppi Sep 18 '10 at 19:26
-
You can use fdupes for that:
$ fdupes -r ~/Music
which gives you a list of all duplicate files.
You can easily install it with
sudo apt-get install fdupes

- 77
-
5This worked better than the other options. However, it does not really seem to be suitable for audio files. The software does not compare tags etc. which leads to a very short list of files whereas in reality there are many many duplicates, however maybe not with exactly the same file size (as they might be from different sources). – Ingo Nov 01 '10 at 14:12
It might be a dozen years late, but I just wrote a command-line program that tries to detect similar audio files by comparing acoustic fingerprints: https://codeberg.org/derat/soundalike
It uses the fpcalc
utility from Chromaprint to generate the fingerprints, and then builds a lookup table to find possible matches before comparing fingerprints more rigorously.
I ran into a similar issue when I had a bunch of duplicate image files. In my case, I just used md5sum
on the files and sorted the results:
for file in $(find $rootdir -name "*.jpg"); do echo $(md5sum $file); done | sort
Files with the same contents generated the same hash, so duplicates could be found easily. I manually deleted the dupes from there, although I could have extended the script to delete all but the first occurrence, but I'm always paranoid about doing that in an ad-hoc script.
Note that this only works for duplicate files with identical contents.

- 139
Try FSlint or dupe gredtter
To install FSlint type in terminal (Ctrl-Alt-T)
sudo apt-get install fslint
hope this is useful..

- 9,855

- 908
I've used FSlint to find duplicate files in general. FSlint is "a utility to find and clean various forms of lint on a filesystem."

- 224
-
-
FSlint is with default settings likely to find duplicate files measured by file name and file size, but not duplicate songs if it's different recordings with different size and file name. – Aputsiak Sep 20 '10 at 16:40
-
Different recordings (remixes?) are not safe for deletion. They may have own value. – Extender Nov 01 '10 at 07:06
-
I have also used fslint for audio (with reasonable success) -- though given some of the alternatives in this thread, I'll probably try one of them next time. – belacqua Jan 24 '11 at 07:16
dupeguru_me
. – rjh Mar 21 '19 at 18:20