1

So, I just got a new 4TB SSHD and I'll have a tech install it at my house by May 26 of 2015. To back up my stuff, I bought a 4TB external drive and decided to copy everything from my internal and 3TB drives to my 4TB drive. I successfully got the 1TB copied over, but I accidentally canceled the 3TB transfer in the middle of the copy -which I was using nautilus- and I want to verify the integrity of the files since I don't know how my 3TB drive got filled up fast and I just want to erase everything on it once I get the integrity checked. To check the integrity, I don't want to re-format the drive since I have some excess files on here that aren't on the other drives. I want to have a command parse the path of every file in "directory a" as if it were chrooted, and then verify that the file exists on "directory b" and has the same hash and size. If the hash is different, the size is different, or the file doesn't exist on "directory b", it'll copy the file again. Here's an example.

Let's say directory A is on /a and directory b is on /b. If there is a file of 500KB that has a path of /a/eevee.png it will expect that /b/eevee.png exists and is 500KB with the same hash. If it doesn't exist, is not 500KB, and/or has a different hashsum, it will copy the file and overwrite without asking.

Also, an optional feature is this. If the file in directory a matches a file in directory b as I previously stated, but the timestamp is different, then the file in directory b has it's timestamp changed to whatever it is in directory a.

user245115
  • 1,163
  • 3
  • 17
  • 33

2 Answers2

3

I actually found a way with a lot of research, didn't dig deep enough prior to posting this. I'd like to reference it in this post here (On Ubuntu Forums, not Ask Ubuntu). User papibe commented:

Ho jonnyboysmithy.

It looks like a job for 'rsync'.

By default rsync will copy and update files based on modification time and size. However, you can specify only to check file size.

For instance:

rsync -av --size-only  source/  destination/

I hope that helps. Let us know how it goes. Regards.

user245115
  • 1,163
  • 3
  • 17
  • 33
2

Here's my crude script for verifying files in two folders, and if any file has an issue - copy it over. I'd suggest using this after copying over files.

The basic idea is to use hexdump for verifying the files. The script takes two directories as inputs. Limitation: It processes only the files inside the directory, not subdirectories ( because I don't know how to do it yet ). So you can use this if you don't have that many folders , but if you have multiple folders and subfolders, running script manually for each directory can be tedious. Food for future thought: create list of folders and subfolders for source and destination, and then automate the script going through the two lists.

Script in Action

I've tester2 as source directory and testerdir as destination, where I've already copied files. On line 123 I run the script to verify files being copied over. On line 124 I change contents of tester2/hello file (change its contents from "TEST" to "TESTER"). On line 125 you can see that the script detects that file hello has been altered/changed in size/corrupt , and copies the file over again. In line 126 you can see me verify that the hello file in destination folder matches that of source folder. enter image description here

Script

#!/bin/bash
# Author: Serg Kolo
# Date: Mon May 25 01:19:59 MDT 2015
# Description: script to verify files in  two directories
# written for http://askubuntu.com/q/627817/295286
copy_the_file ()
{
  cp -f  "$SOURCE_DIR"/"$filename" "$DEST_DIR"/"$filename"
}


if [ $# -ne 2  ]; then
    printf "Usage: verify-files.sh  SOURCE_DIR  DEST_DIR"
    exit 1
fi

SOURCE_DIR="$1"
DEST_DIR="$2"

IFS=":"
for filename in $(find "$SOURCE_DIR" -maxdepth 1 -type f -printf "%f:" | sed 's/.\///g'   )
do

#echo "$filename"
#echo "$DEST_DIR"/"$filename"




    if [ -e "$DEST_DIR"/"$filename"   ];then
        hexdump "$SOURCE_DIR"/"$filename" > .dump1
            hexdump "$DEST_DIR"/"$filename" > .dump2
            diff .dump1 .dump2 > /dev/null
        if [ $? -eq 0  ];then
            echo " "$filename" is OK  "
        else 
            echo ""$filename" has a problem"
            copy_the_file
        fi
    else 
        copy_the_file
    fi


done
Sergiy Kolodyazhnyy
  • 105,154
  • 20
  • 279
  • 497
  • 2
    Why aren't you diffing the files directly instead of diffing the hexdumps? O.o if diff -q "$SOURCE_DIR/$filename" "$DEST_DIR/$filename"; then ... Fan of Rube Goldberg? :P – muru May 26 '15 at 07:27
  • @muru em . . . .because I thought diff works only with text files . . .There's no restriction for filetypes for diff ? and I it's first time hearing of Rube Goldberg,too. – Sergiy Kolodyazhnyy May 26 '15 at 07:38
  • sure, diff is most commonly used that way. You can also use cmp. Also see: http://unix.stackexchange.com/questions/153286/is-cmp-faster-than-diff-q – muru May 26 '15 at 07:42