How to clean duplicates existing in one folder from another recursively?

Question

Summary:

Folder A has only many Excelent files
Folder B has many folders of mixed Excelent/Good/Bad files

How can I delete files in Folder B folders only in case they will be in Folder A.

In other words, how to check if Folder A files exist in Folder B folders then delete from Folder B folders?

Idea of solution Maybe a command which:

Checks part of alphabet like all starting with A
Executing deletion on files found in Folder B subfolders
Repeat 1. + one alphabet up.

Reasons why other duplication programs were bad:

Long time until 1st deletion - it's only when finished scanning
And no possibility to choose deleting in Folder B. It's possible only to keep latest and something else also, but not by choosing in which folder to keep.

Useless history: Files were copied from Recuva in Folder B and partly arranged, but a lot of them are bad. So first, I'm thinking Folder B comparing if exist to those which recovered again, but now just Excelent recovered in Folder A by Recuva so most of Excelent will be just in Folder A.

Example file tree:

.
├── A
│   ├── 1.png
│   ├── 2.png
│   └── Excellent
│       ├── e1.png
│       └── e2.png
└── B
    ├── 1.png
    ├── 2.png
    ├── Bad
    │   ├── 1.png
    │   ├── 2.png
    │   ├── e1.png
    │   └── e2.png
    └── Excellent
        ├── e1.png
        └── e2.png

unutbu · Accepted Answer · 2013-01-13T11:46:25.773

6

Below are two solutions, depending on how we define "duplicate":

Files with the same relative path, or
Files with the same content but not necessarily the same name

If by "duplicate" we mean two files which share the same relative path, then you could use find and xargs to remove the duplicates. For example, suppose you have

~/tmp% tree A
A
└── Excellent
    ├── bar
    ├── baz
    └── foo
~/tmp% tree B
B
├── Bad
│   └── quux
├── Excellent
│   ├── bar
│   ├── baz
│   └── foo
└── Good

Then

find /home/unutbu/tmp/A  -depth -type f -print0 | xargs -0 -I{} bash -c 'rm "/home/unutbu/tmp/B${1#*A}"' - {}

results in

~/tmp% tree B
B
├── Bad
│   └── quux
├── Excellent
└── Good

Or, if by "duplicate" we mean two files share the same content, though perhaps not the same filename, then you could use rdfind:

sudo apt-get install rdfind

If we have this directory structure:

~/tmp% tree A
A
└── Excellent
    ├── bar
    ├── baz
    └── foo

1 directory, 3 files
~/tmp% tree B
B
├── Bad
│   └── quux
├── Excellent
│   ├── barbar
│   ├── bazbaz
│   └── foofoo
└── Good

where barbar has the same content as bar, and similarly for bazbaz and foofoo, then

rdfind -deleteduplicates true A B

results in

~/tmp% tree B
B
├── Bad
│   └── quux
├── Excellent
└── Good

Alternate solution in case your version of Ubuntu does not include rdfind:

You could instead use fdupes:

sudo apt-get install fdupes
fdupes --recurse --delete --noprompt A B

edited Jan 13 '13 at 11:46

answered Jan 06 '13 at 04:05

unutbu

1,072

Maybe its couse im on Ubuntu 10.04 but 1st solution isnt working. For testing i made folder Good and B. In Good is 1 file. In B is 2 subfolders both have 2 files. Command i did kangarooo@kangarooo-laptop:~/000 Test Dups$ find Good/ -depth -type f -print0 | xargs -0 -I{} bash -c 'rm "B${1#*A}"' - {} rm: cannot remove `BGood/vlcsnap-2012-12-11-01h36m32s101.png': No such file or directory
And other programm isnt in 10.04 ill try when ill get to some 12.04 but is there alternative to 10.04 also?
– Kangarooo Jan 13 '13 at 02:31
In your command, change B${1#*A} to B${1#*Good}. This is a bash expression which removes Good from the filename and replaces it with B. The tools used in this solution fairly common to all versions of unix, and should work fine in Ubuntu 10.04. – unutbu Jan 13 '13 at 02:45
Ah yes i see now what was A in there. Ok it works but not if i post to Folder B. It worked if i put it to check subfolder. Is there some recursive option also? And http://packages.ubuntu.com/oneiric/rdfind isnt in 10.04 (lucid) but ill try to get on some 12.04. Ill use combination of both. 1st to delete exact names. Then more cleaning and then exact sizes. Exact sizes is better but since couse of not having exact sizes couse of Excelent/Bad/Good variations of same file then they will have different sizes. But ill do some backup and check test exact comparison command also. – Kangarooo Jan 13 '13 at 04:24
WOW i just understood this is SUPER AMAZING command! It will check files one by one deleting if it exists in B folder. But missing is checking in recusive subfolders of B. – Kangarooo Jan 13 '13 at 04:31
Searching packages.ubuntu.com for packages that contain the word "duplicate" in its description yielded fdupes. This should work in place of rdfind. – unutbu Jan 13 '13 at 11:35
How to make 1st command recursive? With find and xarg? – Kangarooo Jan 14 '13 at 00:54
"recursive" means that all subdirectories are traversed. The find/xarg command does that. Do you mean something else by "recursive"? – unutbu Jan 14 '13 at 01:10
The 1st command i tested doesnt check for duplicates in folder B. So recursive isnt on for checking in B. – Kangarooo Jan 15 '13 at 03:41
It's too late at night for me to think clearly about this right now, but please give an example of paths showing where the duplicates are located. – unutbu Jan 15 '13 at 03:48
No one says any importance this needs to be higher priority then your priorities. File list: . ├── A │ └── Excellent │ ├── 1.png │ └── 2.png └── B ├── Bad │ ├── 1.png │ └── 2.png └── Excellent ├── 1.png └── 2.png
Ah i did more tests and wasnt sure why not working. Then i forgot how i had files for test distributed. So now i found that if duplicate file is in different subfolder then primary file subfolder then it isnt deleted.

So result deleted all in exact same folder content Excellent but not same files in folder Bad.
– Kangarooo Jan 15 '13 at 05:21
1

fdupes isn't really an alternative as it doesn't keep the files based on specified first path but on modification date as explained here – TNT Feb 21 '16 at 06:30
In rdfind, when specifying two directories, is the first one kept as original? I don't completely understand the manpage, it says: "If A was found while scanning an input argument earlier than than B, A is higher ranked." – TNT Feb 21 '16 at 06:36
Another warning about your last suggestion to use fdupes: by default, it keeps the oldest file, so if that happens to be on drive B, your are modifying drive A. Worse, it will remove any duplicates that exist on drive A only, which might be excellent files that are duplicated for a reason! – Bas Swinckels Aug 12 '20 at 08:41

How to clean duplicates existing in one folder from another recursively?

1 Answers1