5

I have two files containing list of all files paths from two hard drives (supposed to be exactly the same), one of which I think has missing files. Both lists have the file path and size, but the lists are not in the same order (see example below).

Is there a command which can compare the difference between the two files and output the difference to a new file?

Example:

file1:

/docs/red
/docs/blue
/docs/yellow
/docs/green

file_2:

/docs/blue
/docs/green
/docs/red

Difference_File:

/docs/yellow
αғsнιη
  • 35,660
SD_NZ
  • 91

5 Answers5

8

Use grep and no need sort them:

grep -Fxvf file2 file1 > diff_file

will return lines which are in file1 but not in file2 (lines missed in file2).

αғsнιη
  • 35,660
  • Nice solution, that perfectly fits to the question. Here is a command-line for two way comparison: FILE1="file1"; FILE2="file2"; cat <(echo -e "\nOnly in $FILE1") <(grep -Fvxf "$FILE2" "$FILE1") <(echo -e "\nOnly in $FILE2") <(grep -Fvxf "$FILE1" "$FILE2"). – pa4080 May 01 '18 at 06:24
  • 2
    The problem with this solution is that it'll go super slow if you've got long files (it's O(N^2) on the length of the longer file). Sorting first and using something like diff or comm will be O(N log N). – Tacroy May 01 '18 at 15:19
6

I would try using sort and diff:

$ diff <(sort csv1.txt) <(sort csv2.txt)
4d3
< 
8d6
< /docs/yellow
David Foerster
  • 36,264
  • 56
  • 94
  • 147
  • 2
    colordiff makes the output easy to read, also adding some options as -c is nice idea. You can sort the files on fly in this way: colordiff -c <(sort csv1.txt) <(sort csv2.txt) – pa4080 May 01 '18 at 05:55
6

I generally use meld (which is a very useful visual diff tool) for such comparisons.

Install meld:

sudo apt-get install meld

Sort, and then compare:

sort csv1.txt > csv1-sorted.txt
sort csv2.txt > csv2-sorted.txt
meld csv1-sorted.txt csv2-sorted.txt 
  • Using process substitution will save you two inodes and relevant disk space (at the expense of memory). – heemayl May 01 '18 at 11:11
  • 2
    An additional benefit to using meld is its ability to also compare directories. Since you are comparing text files which contain file names, possibly you can instead just run meld on the directories and files themselves. – 64pi0r May 01 '18 at 12:47
3

The comm command is designed to answer this sort of question. What it does is take two sorted files as input, then output three columns of text: lines unique to file1, lines unique to file2, and lines common in both files. You can suppress any of these three columns.

In your case, you would want something like:

comm <(sort file1) <(sort file_2) -3 --output-delimiter=''

Which will compare file1 and file_2, then output whatever differences exist to standard output. Use -23 (suppress columns 2 and 3) if you only want the lines unique to file1, or -13 (suppress columns 1 and 3) if you only want the lines unique to file_2

Tacroy
  • 131
2

If your real question is how to compare two mounted file-systems I would use rsync.

See: Rsync compare directories? on Unix & Linux

You can use -n (--dry-run) to cause no files to actually be copied, then the output are the differences. This, by default will also show if one file is newer than another, i.e. whether the contents have changed. I am fairly confident that it can be configured to ignore file contents.

pa4080
  • 29,831
Zak
  • 161