31

The title says it all. How can I detect duplicates in my media library?

Isaiah
  • 59,344
Ingo
  • 6,288

7 Answers7

25

dupeGuru Music Edition is what you want. Set the scan type to "Audio Contents" in Preferences. Please note that the program is fairware so please contribute if you can.

alt text

I suggest you couple this with MusicBrainz Picard which can tag your music files automatically.

alt text

Li Lo
  • 15,894
  • PERFECT! Finally an answer that worked like charm :). A Banshee plugin or whatever would have been even better but this works great! I actually removed around 8GB of dupes, cheers! – Ingo Dec 01 '10 at 08:45
  • 2
    Too bad dupeGuru can only do 10 removals at a time with out putting down money. – John McKean Pruitt May 14 '12 at 23:09
  • The PPA is dead for recent Ubuntu releases. I had success installing the .deb directly from https://launchpad.net/~hsoft/+archive/ubuntu/ppa/+build/9735351 and then running dupeguru_me. – rjh Mar 21 '19 at 18:20
  • The most recent version has no limit on removals. However it doesn't seem to fingerprint the music, it just compares filenames/ID3 information. If you use a tool like Picard to tag your music first, it should do a good job :) – rjh Mar 21 '19 at 18:25
10

There is a plugin that was made some time ago for this. I've used it recently but it still leaves a little to be desired. There is a "PPA" for it - but no built packages yet, just the Bazaar branch. The install instructions go something like this:

wget http://scrawl.bplaced.net/duplicate-source.tar.gz -O tmp.tar.gz && mkdir -vp ~/.gnome2/rhythmbox/plugins/duplicate-source/ && tar -xf tmp.tar.gz -C ~/.gnome2/rhythmbox/plugins && rm -v tmp.tar.gz

If you're interested in using the Bazaar'd source code do the following instead:

mkdir -vp ~/.gnome2/rhythmbox/plugins && cd ~/.gnome2/rhythmbox/plugins && bzr branch lp:rb-duplicate-source duplicate-source

Once it's installed restart Rhythmbox and you should have a Duplicates Finder now in the plugin list.

plugins list

After activating it - there are additional configuration options available.

configuration window

After the plugin is enabled - and when it finds duplicates - it'll add an additional option to your library list:

list

A few settings that I've found as "odd" - I've tried this on a media library with over 120,000 songs (over 1,000 duplicates) and a library with about 1,000 songs and maybe 30 duplicates. On the former it took a VERY long time and crashed Rhythmbox several times during the search. I eventually went with Automatically "Remove from Library" to avoid having to rebuild the list. On smaller libraries everything works great though.

When a duplicate is found - if you have the default options selected - the lower quality version of the song will be added to the list. So it's safe to select all songs on the Duplicates list and "Remove" (Either delete from disk or remove from library).

Marco Ceppi
  • 48,101
  • Thanks a lot for giving this tip! However, nothing seems to happen once I've activated it. I can not find a new tab or whatever where the duplicates would be indicated. – Ingo Sep 18 '10 at 18:27
  • You may need to move the threshold to 0.5 to help sort out duplicates. It'll show as an additional item in your Library list. I've updated the answer to show this. – Marco Ceppi Sep 18 '10 at 19:26
  • Tried that out as well. No success. – Ingo Oct 10 '10 at 10:18
6

You can use fdupes for that:

$ fdupes -r ~/Music

which gives you a list of all duplicate files.

You can easily install it with

sudo apt-get install fdupes
Johann
  • 77
  • 5
    This worked better than the other options. However, it does not really seem to be suitable for audio files. The software does not compare tags etc. which leads to a very short list of files whereas in reality there are many many duplicates, however maybe not with exactly the same file size (as they might be from different sources). – Ingo Nov 01 '10 at 14:12
3

It might be a dozen years late, but I just wrote a command-line program that tries to detect similar audio files by comparing acoustic fingerprints: https://codeberg.org/derat/soundalike

It uses the fpcalc utility from Chromaprint to generate the fingerprints, and then builds a lookup table to find possible matches before comparing fingerprints more rigorously.

derat
  • 31
3

I ran into a similar issue when I had a bunch of duplicate image files. In my case, I just used md5sum on the files and sorted the results:

for file in $(find $rootdir -name "*.jpg"); do echo $(md5sum $file); done | sort

Files with the same contents generated the same hash, so duplicates could be found easily. I manually deleted the dupes from there, although I could have extended the script to delete all but the first occurrence, but I'm always paranoid about doing that in an ad-hoc script.

Note that this only works for duplicate files with identical contents.

John Bode
  • 139
1

Try FSlint or dupe gredtter

To install FSlint type in terminal (Ctrl-Alt-T)

sudo apt-get install fslint

hope this is useful..

stephenmyall
  • 9,855
-2

I've used FSlint to find duplicate files in general. FSlint is "a utility to find and clean various forms of lint on a filesystem."

Aputsiak
  • 224
  • That is strange. FSlint does not find any of my duplicate songs! – Ingo Sep 18 '10 at 18:54
  • FSlint is with default settings likely to find duplicate files measured by file name and file size, but not duplicate songs if it's different recordings with different size and file name. – Aputsiak Sep 20 '10 at 16:40
  • Different recordings (remixes?) are not safe for deletion. They may have own value. – Extender Nov 01 '10 at 07:06
  • I have also used fslint for audio (with reasonable success) -- though given some of the alternatives in this thread, I'll probably try one of them next time. – belacqua Jan 24 '11 at 07:16