236

I have two directories that should contain the same files and have the same directory structure.

I think that something is missing in one of these directories.

Using the bash shell, is there a way to compare my directories and see if one of them is missing files that are present in the other?

Braiam
  • 67,791
  • 32
  • 179
  • 269
AndreaNobili
  • 4,539
  • 10
  • 26
  • 36

20 Answers20

205

You can use the diff command just as you would use it for files:

diff <directory1> <directory2>

If you want to see subfolders and -files too, you can use the -r option:

diff -r <directory1> <directory2>
Alex R.
  • 2,347
  • 2
  • 11
  • 8
  • 3
    Didn't know diff works for directories as well(man diff confirmed that), but this doesn't recursively check for changes in subdirectories inside subdirectories. – jobin Feb 16 '14 at 17:04
  • 2
    @Jobin That's strange... For me, it does work. – Alex R. Feb 16 '14 at 17:07
  • 1
    I have something like this: a/b/c/d/a, x/b/c/d/b. See what diff a x gives you. – jobin Feb 16 '14 at 17:09
  • 4
    You have to use the -r option. That (diff -r a x) gives me: Only in a/b/c/d: a. only in x/b/c/d: b. – Alex R. Feb 16 '14 at 17:11
  • Cool! It works! +1. Diff just got more powerful(for me)! :) – jobin Feb 16 '14 at 17:12
  • 8
    diff show me the difference INTO files but not if a directory contains a file that the other one not contains !!! I don't need know the differences into file but also if a file exist in a directory and not in the other one – AndreaNobili Feb 16 '14 at 17:17
  • is this a function that allows us to see what files are the same between two folders – BenKoshy Feb 12 '16 at 06:18
  • 3
    @AndreaNobili, GNU diff shows Only in directory1/path for files in only one of the folders. – joeytwiddle Mar 06 '16 at 04:11
  • I'm looking for a way to diff two dirs and also include file attributes (timestamps, permissions etc). any ideas? – cavalcade Oct 01 '16 at 03:31
  • To additionally get information for files files which are equal both by filename and contents, use -s. Example: diff -s directory-a directory-b – Abdull Oct 27 '21 at 19:00
158

A good way to do this comparison is to use find with md5sum, then a diff.

Example

Use find to list all the files in the directory then calculate the md5 hash for each file and pipe it sorted by filename to a file:

find /dir1/ -type f -exec md5sum {} + | sort -k 2 > dir1.txt

Do the same procedure to the another directory:

find /dir2/ -type f -exec md5sum {} + | sort -k 2 > dir2.txt

Then compare the result two files with diff:

diff -u dir1.txt dir2.txt

Or as a single command using process substitution:

diff <(find /dir1/ -type f -exec md5sum {} + | sort -k 2) <(find /dir2/ -type f -exec md5sum {} + | sort -k 2)

If you want to see only the changes:

diff <(find /dir1/ -type f -exec md5sum {} + | sort -k 2 | cut -f1 -d" ") <(find /dir2/ -type f -exec md5sum {} + | sort -k 2 | cut -f1 -d" ")

The cut command prints only the hash (first field) to be compared by diff. Otherwise diff will print every line as the directory paths differ even when the hash is the same.

But you won't know which file changed...

For that, you can try something like

diff <(find /dir1/ -type f -exec md5sum {} + | sort -k 2 | sed 's/ .*\// /') <(find /dir2/ -type f -exec md5sum {} + | sort -k 2 | sed 's/ .*\// /')

This strategy is very useful when the two directories to be compared are not in the same machine and you need to make sure that the files are equal in both directories.

Another good way to do the job is using Git’s diff command (may cause problems when files has different permissions -> every file is listed in output then):

git diff --no-index dir1/ dir2/
Zanna
  • 70,465
Adail Junior
  • 1,806
65

Through you are not using bash, you can do it using diff with --brief and --recursive:

$ diff -rq dir1 dir2 
Only in dir2: file2
Only in dir1: file1

The man diff includes both options:

-q, --brief
report only when files differ

-r, --recursive
recursively compare any subdirectories found

Braiam
  • 67,791
  • 32
  • 179
  • 269
29

Maybe one option is to run rsync two times:

rsync -rtOvcs --progress -n /dir1/ /dir2/

With the previous line, you will get files that are in dir1 and are different (or missing) in dir2.

rsync -rtOvcs --progress -n /dir2/ /dir1/

The same for dir2

#from the rsync --help :
-n, --dry-run               perform a trial run with no changes made

-r, --recursive             recurse into directories
-t, --times                 preserve modification times
-O, --omit-dir-times        omit directories from --times
-v, --verbose               increase verbosity
    --progress              show progress during transfer
-c, --checksum              skip based on checksum, not mod-time & size
-s, --protect-args          no space-splitting; only wildcard special-chars

You can delete the -n option to undergo the changes. That is copying the list of files to the second folder.

In case you do that, maybe a good option is to use -u, to avoid overwriting newer files.

-u, --update                skip files that are newer on the receiver

A one-liner:

rsync -rtOvcsu --progress -n  /dir1/ /dir2/ && rsync -rtOvcsu --progress -n /dir2/ /dir1/
Ferroao
  • 850
17

Here is an alternative, to compare just filenames, and not their contents:

diff <(cd folder1 && find . | sort) <(cd folder2 && find . | sort)

This is an easy way to list missing files, but of course it won't detect files with the same name but different contents!

(Personally I use my own diffdirs script, but that is part of a larger library.)

joeytwiddle
  • 1,957
  • 3
    You'd better use process substitution, not temp files... – mniip Feb 16 '14 at 18:03
  • 3
    Note that this does not support file names with certain special characters, in that case you might want to use zero-delimiters which AFAIK diff is not supporting as of now. But there is comm which is supporting it since http://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=f3b4def577c4eee22f83b72d1310aa1d9155908d so once it comes to a coreutils near you, you can do comm -z <(cd folder1 && find -print0 | sort) <(cd folder2 && find -print0 | sort -z) (whose output you might have to further convert in the format you need using the --output-delimiterparameter and additional tools). – phk Mar 05 '16 at 21:52
15

I would like to suggest a great tool that I have just discover: MELD.

It works properly and everything you can do with the command diff on Linux-based system, can be there replicated with a nice Graphic Interface!

For instance, the comparison of directories is straightforward:

directories comparison

and also the files comparison is made easier:

files comparison

There is a nice integration with some control version (for instance Git) and can be used as merge tool. See the complete documentation on its website.

mojoaxel
  • 103
Leos313
  • 1,926
  • 1
    Great recommendation. I use Meld all the time for text file comparison, but had forgotten that it could do directories as well. My only gripe is that the UI doesn't resize in a way that lets me see long paths completely. – John T Jul 15 '21 at 10:47
5

If you want to make each file expandable and collapsible, you can pipe the output of diff -r into Vim.

First let's give Vim a folding rule:

mkdir -p ~/.vim/ftplugin
echo "set foldexpr=getline(v:lnum)=~'^diff.*'?'>1':1 foldmethod=expr fdc=2" >> ~/.vim/ftplugin/diff.vim

Now just:

diff -r dir1 dir2 | vim - -R

You can hit zo and zc to open and close folds. To get out of Vim, hit :q<Enter>

The -R is optional, but I find it useful alongside - because it stops Vim from bugging you to save the buffer when you quit.

joeytwiddle
  • 1,957
5

Inspired by Sergiy's reply, I wrote my own Python script to compare two directories.

Unlike many other solutions it doesn't compare contents of the files. Also it doesn't go inside subdirectories which are missing in one of the directories. So the output is quite concise and the script works fast with large directories.

#!/usr/bin/env python3

import os, sys

def compare_dirs(d1: "old directory name", d2: "new directory name"):
    def print_local(a, msg):
        print('DIR ' if a[2] else 'FILE', a[1], msg)
    # ensure validity
    for d in [d1,d2]:
        if not os.path.isdir(d):
            raise ValueError("not a directory: " + d)
    # get relative path
    l1 = [(x,os.path.join(d1,x)) for x in os.listdir(d1)]
    l2 = [(x,os.path.join(d2,x)) for x in os.listdir(d2)]
    # determine type: directory or file?
    l1 = sorted([(x,y,os.path.isdir(y)) for x,y in l1])
    l2 = sorted([(x,y,os.path.isdir(y)) for x,y in l2])
    i1 = i2 = 0
    common_dirs = []
    while i1<len(l1) and i2<len(l2):
        if l1[i1][0] == l2[i2][0]:      # same name
            if l1[i1][2] == l2[i2][2]:  # same type
                if l1[i1][2]:           # remember this folder for recursion
                    common_dirs.append((l1[i1][1], l2[i2][1]))
            else:
                print_local(l1[i1],'type changed')
            i1 += 1
            i2 += 1
        elif l1[i1][0]<l2[i2][0]:
            print_local(l1[i1],'removed')
            i1 += 1
        elif l1[i1][0]>l2[i2][0]:
            print_local(l2[i2],'added')
            i2 += 1
    while i1<len(l1):
        print_local(l1[i1],'removed')
        i1 += 1
    while i2<len(l2):
        print_local(l2[i2],'added')
        i2 += 1
    # compare subfolders recursively
    for sd1,sd2 in common_dirs:
        compare_dirs(sd1, sd2)

if __name__=="__main__":
    compare_dirs(sys.argv[1], sys.argv[2])

If you save it to a file named compare_dirs.py, you can run it with Python3.x:

python3 compare_dirs.py dir1 dir2

Sample output:

user@laptop:~$ python3 compare_dirs.py old/ new/
DIR  old/out/flavor-domino removed
DIR  new/out/flavor-maxim2 added
DIR  old/target/vendor/flavor-domino removed
DIR  new/target/vendor/flavor-maxim2 added
FILE old/tmp/.kconfig-flavor_domino removed
FILE new/tmp/.kconfig-flavor_maxim2 added
DIR  new/tools/tools/LiveSuit_For_Linux64 added

P.S. If you need to compare file sizes and file hashes for potential changes, I published an updated script here: https://gist.github.com/amakukha/f489cbde2afd32817f8e866cf4abe779

  • 1
    Thanks, I added an optional third param regexp to skip/ignore https://gist.github.com/mscalora/e86e2bbfd3c24a7c1784f3d692b1c684 to make just what I needed like: cmpdirs dir1 dir2 '/\.git/' – Mike Feb 18 '18 at 22:15
4

Fairly easy task to achieve in python:

python -c 'import os,sys;d1=os.listdir(sys.argv[1]);d2=os.listdir(sys.argv[2]);d1.sort();d2.sort();x="SAME" if d1 == d2 else "DIFF";print x' DIR1 DIR2

Substitute actual values for DIR1 and DIR2.

Here's sample run:

$ python -c 'import os,sys;d1=os.listdir(sys.argv[1]);d2=os.listdir(sys.argv[2]);d1.sort();d2.sort();x="SAME" if d1 == d2 else "DIFF";print x' Desktop/ Desktop
SAME
$ python -c 'import os,sys;d1=os.listdir(sys.argv[1]);d2=os.listdir(sys.argv[2]);d1.sort();d2.sort();x="SAME" if d1 == d2 else "DIFF";print x' Desktop/ Pictures/
DIFF

For readability, here's an actual script instead of one-liner:

#!/usr/bin/env python
import os, sys

d1 = os.listdir(sys.argv[1])
d2 = os.listdir(sys.argv[2])
d1.sort()
d2.sort()

if d1 == d2:
    print("SAME")
else:
    print("DIFF")
Sergiy Kolodyazhnyy
  • 105,154
  • 20
  • 279
  • 497
2

Adail Junior's nice answer might have an issue in time execution if you have hundreds of thousands of files! So here is another way to do it. Say you want to compare all the filenames of folder A with all the filenames of folder B. Step 1, cd to folder A and do:

find . | sort -k 2 > listA.txt

Step 2, cd to folder B and do:

find . | sort -k 2 > listB.txt

Step 3, take the diff of listA.txt and listB.txt

I tried that in folders containing half a million txt files and in less than 30 secs I had the diff on my screen, whereas computing the md5sums and then piping and then appending can be very very time consuming. Note also the original question is asking for comparing filenames (not their content!) and check if there are files missing between the folders under comparison! Thanks

pebox11
  • 537
  • 1
  • 4
  • 14
1

As already noted, you can also use the comm command, e.g. this way:

comm -3 <(ls -1 dir1) <(ls -1 dir2)

This compares the contents of the 2 directories, showing only 2 columns, each with files unique to that directory.

muru
  • 197,895
  • 55
  • 485
  • 740
1

On a slow file system, diff might take a while, but I have made good experiences with rsync, as it works well incrementally:

rsync --recursive --progress --delete --links --dry-run

Aliased as rdiff, this is an example run:

> rdiff test/ testuser
sending incremental file list
deleting .sudo_as_admin_successful
.bash_history
.bash_logout
.bashrc
.profile

It obviously only lists files without diffing them, but I find that tremendously useful already.

xeruf
  • 412
1

After searching for years now I finally have found a neat solution for very large folders (> 1TB) which is a combination of diff and Meld.

Here is how it works:

1) Compare the two directories using diff:

diff -rq /directory/path1 /directory/path2

Optional: Save the output in a textfile called comparison.txt

diff -rq /directory/path1 /directory/path2 > comparison.txt

This will give you something like this:

  Files /directory/path1/file1.txt and /directory/path2/file1.txt differ
Only in /directory/path1/: file2.txt
  Files /directory/path1/subdir/file4.txt and /directory/path2/subdir/file4.txt differ
Only in /directory/path1/subdir/: file5.txt
  Files /directory/path1/subdir2/file8.txt and /directory/path2/subdir2/file8.txt differ
Only in /directory/path1/subdir2/: file9.txt
  Files /directory/path1/subdir3/file13.txt and /directory/path2/subdir3/file13.txt differ
Only in /directory/path1/subdir3/: file14.txt
Only in /directory/path1/subdir3/: file15.txt
Only in /directory/path1/subdir3/: file16.txt
Only in /directory/path1/subdir3/: file17.txt
Only in /directory/path1/subdir3/: file18.txt
Only in /directory/path2/subdir3/: file19.txt
Only in /directory/path2/subdir3/: file20.txt
Only in /directory/path2/: file3.txt
Only in /directory/path2/subdir/: file6.txt
Only in /directory/path2/subdir/: file7.txt
Only in /directory/path2/subdir2/: file10.txt
Only in /directory/path2/subdir2/: file11.txt
Only in /directory/path2/subdir2/: file12.txt

2) Manually extract common subfolders. In this example:

/directory/path1
/directory/path2

3) Optional: In the Meld preferences: Compare files based only on size and timestamp to speed up things.

enter image description here

4) Compare and merge the extracted paths using Meld.

Open a separate tab for each subfolder from step 2). Opening separate tabs helps to keep track of the progress you have made. You may use deeper levels in the directory tree to speed up the process. Delete the lines from the text-file that have been merged. You can work on the tabs in parallel.

Advantages of this approach:

Combining diff an Meld gives you the advantages of both worlds:

  1. Super fast comparison as diff is the fastest linux tool for this purpose to my knowledge.
  2. Usage of Meld which is one of the most convenient visual diff tools the Linux world has to offer. As only subfolders are compared, this approach is fast.
Ohumeronen
  • 150
  • 4
1

If you want a simple UI tool for this task you could consider Diff Folders plugin for Visual Studio Code.

It works flawlessly and does everything that I need.

enter image description here

0

I'll add to this list a NodeJs alternative that I've written some time ago.

dir-compare

npm install dir-compare -g
dircompare dir1 dir2
gliviu
  • 21
  • 2
0

You could use this tool:

https://github.com/jfabaf/comparefolders/

I developed it a few years ago because I had same problem.

It compares MD5 of files, so It doesn't matter the name of files.

jfabaf
  • 1
0

Answers using "batteries included" Python miss such battery - filecmp module:

https://docs.python.org/3/library/filecmp.html

Sample solution from Python's docs:

#!/usr/bin/env python

from filecmp import dircmp

def print_diff_files(dcmp): for name in dcmp.diff_files: print(f"diff_file {name} found in {dcmp.left} and {dcmp.right}")

for sub_dcmp in dcmp.subdirs.values():
    print_diff_files(sub_dcmp)


dcmp = dircmp("dir1", "dir2") print_diff_files(dcmp)

murla
  • 1
0

Unison

The text mode program unison and GUI program unison-gtk can be installed with

sudo apt update
sudo apt install unison

Unison is dedicated to synchronize directory trees within computers and between computers.

  • There is a comparison
  • You can inspect the result and decide if/how you want to modify the default action (which updates to the newest status)
  • Finally files are transferred according to the selected actions

See man unison

You can find explanations of the options in man ffmpeg

This manual page briefly documents Unison, and was written for the Debian GNU/Linux distribution because the original program does not have a manual page. For a full description, please refer to the inbuilt documentation or the manuals in /usr/share/doc/unison/. The unison-2.48.4-gtk binary has similar command-line options, but allows the user to select and create profiles and configure options from within the program.

Unison is a file-synchronization tool for Unix and Windows. It allows two replicas of a collection of files and directories to be stored on different hosts (or different disks on the same host), modified separately, and then brought up to date by propagating the changes in each replica to the other.

Unison offers several advantages over various synchronization methods such as CVS, Coda, rsync, Intellisync, etc. Unison can run on and synchronize between Windows and many UNIX platforms. Unison requires no root privileges, system access or kernel changes to function. Unison can synchronize changes to files and directories in both directions, on the same machine, or across a network using ssh or a direct socket connection.

Transfers are optimised using a version of the rsync protocol, making it ideal for slower links. Unison has a clear and precise specification, and is resilient to failure due to its careful handling of the replicas and its private structures.

The two roots can be specified using an URI or a path. The URI must follow the convention:

protocol://[user@][host][:port][/path]. The protocol part can be `file, socket, ssh or rsh`.

There is a learning curve, but it is worth the effort :-)

sudodus
  • 46,324
  • 5
  • 88
  • 152
0

Dont see this in the answers but if you want to check for filesnames excluding extensions such that hello.png matches hello.zip I just used nested for loops.

for f in *.zip; do for f2 in ./dirtwo/*.png; 
    #Your logix here
done;done

for example my full code is

for f in *.zip; do 
    for f2 in ./ArcadeBezels/*.png; do
        if [ "${f:0:-4}" = "${f2:15:-4}" ]; then 
        # I'm sure something more readable can be found but this strips the last 4 chars (file ext) of both filenames and the first 15 chars of second filename to remove "./ArcadeBezels/"
            rm "$f";
        fi;
    done;
done
Leathan
  • 135
0

A few iteration with ChatGPT. Copy the code and save as compare-dirs.py.

Usage:

python3 compare-dirs.py dir1 dir2

import os
import sys

ANSI escape codes for colors

RED = '\033[91m' YELLOW = '\033[93m' GRAY = '\033[90m' ENDC = '\033[0m' # Reset to default color

Check if correct number of arguments are provided

if len(sys.argv) != 3: print("Usage: python script_name.py <directory1> <directory2>") sys.exit(1)

Directories to compare, taken from command-line arguments

dir1 = sys.argv[1] dir2 = sys.argv[2]

def get_total_size(path): """Calculate the total size of all files in the directory.""" total_size = 0 for root, dirs, files in os.walk(path): for f in files: fp = os.path.join(root, f) # Safely get the file size and accumulate it try: total_size += os.path.getsize(fp) except OSError: pass # Ignore files which can't be accessed return total_size

def get_subdir_sizes(dir_path): """Returns a dictionary of subdirectory names and their total sizes.""" sizes = {} for entry in os.listdir(dir_path): full_path = os.path.join(dir_path, entry) if os.path.isdir(full_path): sizes[entry] = get_total_size(full_path) return sizes

sizes1 = get_subdir_sizes(dir1) sizes2 = get_subdir_sizes(dir2)

Print all subdirectories, highlighting differences or indicating missing ones

print(f"All subdirectories and sizes between {dir1} and {dir2}:")

for subdir in sorted(set(sizes1.keys()) | set(sizes2.keys())): size1 = sizes1.get(subdir, 'MISSING') size2 = sizes2.get(subdir, 'MISSING') if size1 == 'MISSING' or size2 == 'MISSING': print(f"{RED}{subdir:25} {str(size1):10} {str(size2):10}{ENDC}") elif size1 != size2: print(f"{YELLOW}{subdir:25} {str(size1):10} {str(size2):10}{ENDC}") else: print(f"{GRAY}{subdir:25} {size1:10} (identical){ENDC}")