4

I know about rdfind that can find duplicate files in two directories. But I need a similar utility that finds duplicate folders (folders that have same name and same path relative to main directories) in two main directories. Is there any utility that do this simple task?

**Example:**
$ tree
.
├── maindir1
│   ├── dir space
│   │   ├── dir1
│   │   └── dir2
│   ├── dir1
│   ├── dir2
│   │   └── new\012line
│   ├── dir3
│   │   └── dir5
│   └── dir4
│       └── dir6
├── maindir2
│   ├── dir space
│   │   └── dir2
│   ├── dir1
│   ├── dir2
│   │   └── new\012line
│   ├── dir5
│   │   └── dir6
│   ├── dir6
│   └── new\012line
├── file
└── new\012line

NOTE: In above example the only duplicate folders in first level (depth 1) are:

maindir1/dir space/ & maindir2/dir space/
maindir1/dir1/ & maindir2/dir1/
maindir1/dir2/ & maindir2/dir2/

In second level (depth 2), the only duplicate folders are:

maindir1/dir space/dir2/ & maindir2/dir space/dir2/
maindir1/dir2/new\012line/ & maindir2/dir2/new\012line/

Please note that maindir1/dir3/dir5/ and maindir2/dir5/ are not duplicates and also maindir1/dir4/dir6/ and maindir2/dir5/dir6/ are not duplicates.

PHP Learner
  • 2,788
  • 10
  • 30
  • 48

1 Answers1

3

I don't know of any utility that is specific to directories (but things like fslint or fdupes should also list directories) but it's easy enough to script:

#!/usr/bin/env bash

## Declare $dirs and $count as associative arrays
declare -A dirs
declare -A count

find_dirs(){
    ## Make ** recurse into subdirectories
    shopt -s globstar
    for d in "$1"/**
    do
    ## Remove the top directory from the dir's path
    dd="${d#*/}"
    ## If this is a directory, and is not the top directory
    if [[ -d "$d" && "$dd" != "" ]]
    then
        ## Count the number of times it's been seen
        let count["$dd"]++
        ## Add it to the list of paths with that name.
        ## I am using the `&` to separate directory entries
        dirs["$dd"]="${dirs[$dd]} & $d" 
    fi

    done
}

## Iterate over the list of paths given as arguments
for target in "$@"
do
    ## Run the find_dirs function on each of them
    find_dirs "$target"
done

## For each directory found by find_dirs
for d in "${!dirs[@]}"
do
    ## If this name has been seen more than once
    if [[ ${count["$d"]} > 1 ]]
    then
    ## Print the name with pretty colors
    printf '\033[01;31m+++ NAME: "%s" +++\033[00m\n' "$d"
    ## Print the paths with that name
    printf "%s\n" "${dirs[$d]}" | sed 's/^ & //'
    fi
done

The script above can deal with arbitrary directory names (including those with spaces or even newlines in their names) and will recurse into any number of subdirectories. For example, on this directory structure:

$ tree
.
├── maindir1
│   ├── dir1
│   ├── dir2
│   │   └── new\012line
│   ├── dir3
│   │   └── dir5
│   ├── dir4
│   │   └── dir6
│   └── dir space
│       ├── dir1
│       └── dir2
└── maindir2
    ├── dir1
    ├── dir2
    │   └── new\012line
    ├── dir5
    │   └── dir6
    ├── dir6
    ├── dir space
    │   └── dir2
    └── new\012line

It will return this:

Screenshot showing the script's output

terdon
  • 100,812
  • I think "${dirs[$dd]}\0$d" doesn't actually use a null character, so you might as well have used ,. – muru Sep 27 '15 at 12:43
  • 1
    @muru ah, and that explains why I had to escape the \ in sed. I think you're right. Thanks, edited. – terdon Sep 27 '15 at 12:45
  • @terdon Thank you for your code, but what I need is to find duplicate folders with the same name and same sub-paths in two given folders separated based on their sub-path depth. In your code, It just checks folder name. It was my mistake that not cleared what I meant. I added an example in my question for more clearance. – PHP Learner Sep 29 '15 at 07:10
  • @PHPLearner ah, OK. So will there only be two main directories or should the script support an arbitrary number? – terdon Sep 29 '15 at 10:36
  • @PHPLearner I made it support an arbitrary number of target dirs, may as well. Have a look at the updated answer. – terdon Sep 29 '15 at 11:04