15

After I copy say 50+ GB(30,000 files including different formats) of files from an internal hard drive to an external drive is there any way to find out if everything has been copied correctly? Also if I stop in between by canceling the operation and later say merge when continuing the operation will the correctness take a hit?

I could use applications like back-in-time but I am very choosy in copying files and for the next time I intend to use copy operation and say merge instead of replace. Is it advisable when copying large number of files?

Chethan S.
  • 2,864

5 Answers5

20

I'm using hashdeep to verify backups/restores and occasionally to check for file system corruption in a RAID.

The speed depends on which hash functions you use (some are more CPU intensive than others) as well as the read speed of your disks. On my system hashdeep can process or verify around 1 TB/hour with md5 and 300 MB/s read speed.


Example on calculating checksums and storing them in a file:

cd my-data
hashdeep -rlc md5 . > ~/checksums.txt

Parameters:

  • r – recursive
  • l – use relative paths
  • c – specify hash function
  • . – recursive starting at the current directory
  • > – redirect output to the specified file

See the man page.


Example on verifying checksums and printing a list of differences:

$ cd /mnt/my-backup
$ hashdeep -ravvl -k ~/checksums.txt .
hashdeep: Audit passed
          Files matched: 40914
Files partially matched: 0
            Files moved: 0
        New files found: 0
  Known files not found: 0

Parameters:

  • a – audit (compare with the list of known checksums)
  • v – verbose (to get a listing of mismatches, multiple vs means more verbose)
  • k – file of known hashes

Note that as of March 2016 hashdeep appears to be abandoned.

David Foerster
  • 36,264
  • 56
  • 94
  • 147
j-g-faustus
  • 5,538
16

It looks like the perfect task for rsync. Rsync is comparing and copying diffs.

The rsync utility first popped into my mind when I saw your question. Doing something like below could quickly show what files are in directory a but not in b:

$ rsync -rcnv a/* b/

-r will recurse into the directories
-c will compare based on file checksum
-n will run it as a "dry run" and make no changes, but just print out the files 
   that would be updated
-v will print the output to stdout verbosely

This is a good option because you can compare the contents of the files as well to make sure they match. rsync's delta algorithm is optimized for this type of use case. Then if you want to make b match the contents of a, you can just remove the -n option to perform the actual sync.

Some related questions:

ddeimeke
  • 3,089
  • 1
    rsync is definitely the tool for this job, but it doesn't compare and copy diffs, per se. It compares the files using sizes and hashes. – Justin Force Apr 28 '11 at 19:18
  • @JustinForce Using size? Sure, different size make certain that file is not exactly the same, but rsync is very versatile, it can optionally trust metadata (like time) to avoid re-reading all files. When copying through network, it computes a rolling hash to finely detect common parts to avoid transferring them, but on local drive by default hashes don't play this role (if used at all). You can even ask it to trust that a shorter destination already has correct content and just needs appending, though let's stay on topic. – Stéphane Gourichon Feb 07 '16 at 21:03
9

If the GUI apps suggested over at File and directory comparison tool? don't do it for you, try diff -rq /path/to/one /path/to/other to recurse through both directories quietly, logging only differences to the screen.

Amanda
  • 9,333
3

The situation you are saying is too complex. Though you can write a script to calculate MD5 of all the files you want to copy and later on compare them with the ones copied:

If you want something simple and fast (it will not work in very complex scenarios) you can use Meld

sudo apt-get install meld
Zanna
  • 70,465
puneet
  • 1,302
0

On the "if everything has been copied correctly", I use a modified cp (or mv) which includes checksumming (optionally stored in xattr, hence it only has to be calculated once for the source) http://sourceforge.net/projects/crcsum/

Hans
  • 1
  • 1
    Although your answer is 100% correct, it is also nearly impossible for a beginning user to implement. Therefore, please [edit] your answer, and include the steps on how to download, compile, install and uninstall crccp in your answer! ;-) You can always leave the link in at the bottom of your answer as a source for your material... – Fabby Feb 05 '15 at 14:07