How can two large directories be compared visually?

Question

I had a huge directory of all sorts of things from family photographs to git repositories to programs with thousands of visible and hidden files, which was synced over ~10 computers. I was forced to move from Dropbox to Nextcloud in order to keep this huge directory in sync over all these computers.

I have found in this move that Nextcloud has deleted vast numbers of files, everything from hidden files to .tex files. I am continuing to try to understand why this has happened and have asked at the appropriate Nextcloud forum, and am fortunate enough to have a backup of the Dropbox directory.

SO...

Given that I have these two huge, complex directories with millions of files in them, how can I compare them visually, as a human, to try to understand the damage that Nextcloud has done? How can I understand what Nextcloud has been wiping out?

Basically I need something big, clear and visual like gdmap mixed with tkdirdiff.

I request guidance and suggestions.

Possible duplicate of sync two folders on two separate ext HDD - FreeFileSync is my choice. If you need to search for duplicate items - you can use fslint-gui. — N0rbert, Nov 13 '18 at 20:12

score 2 · Answer 1 · answered Nov 19 '18 at 21:50

2

I just had the same happen to me when moving from local storage to NAS: there was a difference in size and I did the following:

stat -c "%s %n" /media/Data/ > /tmp/DSK
stat -c "%s %n" /media/NAS/ > /tmp/NAS

which lists size (%s) and name (%n) of all files respectively and then loaded DSK and NAS in my favourite editor and visually compared both files.

In my case only one file was different and it's copying as I'm writing this, but in your case you might want to remove the directory names of the files:

sed 's/\/media\/Data//g' /tmp/DSK > /tmp/DSK_Files
sed 's/\/media\/NAS//g'  /tmp/NAS > /tmp/NAS_Files

and let the shell figure out the differences:

diff --context=0 /tmp/DSK_Files /tmp/NAS_Files

answered Nov 19 '18 at 21:50

Fabby

34,259

1

+1 on my path to sportsmanship badge... Lol – WinEunuuchs2Unix Nov 19 '18 at 22:46
@Fabby: Why not in a single line with find: find . -type f -printf "%s\t%p\n"|sort then use vimdiff on the 2 files output. You can even do it with checksum like WinEunuuchs2Unix suggested: find . -type f -printf "%s\t" -exec md5sum {} \;|sort -k 3. Look ma' no hands^H^H^H I mean no sql :-) – solsTiCe Nov 21 '18 at 18:44
@solsTiCe Because you're better than me? ;-) It was just one directory for me so I translated my use case to this... Please feel free to edit and teach me something! 0:-) – Fabby Nov 21 '18 at 19:06

score 1 · Answer 2 · answered Nov 13 '18 at 14:14

With millions of files I would create a checksum on each NextCloud file.

Then I would add each checksum to an SQLite database.

Then I would write a script that scanned each file in a client directory. This script would run on each of the 10 computers with images / videos:

Generate checksum for each file.
Lookup checksum in SQLite database.
If checksum not found then copy file to Nextcloud.

I would NOT compare files based on date and time as they may have changed when Nextcloud was populated.

That's the smart way of doing it. Mine is the dirty way... :-) — Fabby, Nov 19 '18 at 21:53

score 0 · Answer 3 · answered Nov 13 '18 at 03:36

With millions of files, you really don't want to do a visual inspection of the differences. the program meld would be suitable for asmaller number of files, but you should be thinking along the lines of reports from sort, and uniq. Prepare lists from a common directory starting point so the paths are identical (find . will produce such a terminated path listing), sort the two lists together, and use uniq to report on non-duplicated lines. (Ensure all uniq files are from the one list, not mixed between the two). Decide how you want to copy the missing files to the other location.

How can two large directories be compared visually?

3 Answers3