4

I frequently need to back up a source disk to multiple target disks. At the moment the workflow is rsync from source to target1 then dd clone target1 to target2.
It would be lovely if I could rsync from source to target1+target2 sequentially in the same read operation to greatly speed things up. Read the file into memory one time and write it to two separate hard drives in grand synchrony.
Is there perhaps a way to have rsync read the file into memory and pipe the output to two write operations simultaneously? Doesn't have to be rsync but that'd be greatly preferred.

rellimmot
  • 53
  • 6
  • I think what you want happens already in current versions of Ubuntu, if there is enough free RAM for the whole file or the whole batch of files that rsync or some other tool is to copy. (But a huge file or batch of files might not fit into RAM). I have noticed that when I create USB boot drives from an iso file, the second time the read process needs almost no time (but of course the write process will still be limited by the write speed of each target device). – sudodus Jul 11 '21 at 14:30

2 Answers2

6

Putting together info from several sources, there are a few options.

The conclusion is that only with command parallel you might get what you want, see below.

Important notes:

  1. I have made tests with cp for copying. You should also consider speedups (or down!) obtained with rsync vs cp or other alternative commands, combined with parallel.
  2. I have tested copying only one file. Results might change if copying many files (e.g., combining a few large files, as you need, with many other small files and subdirectories).

For each of the options, I have tested both
time <option #N, copying to one target>
time <option #N, copying to two targets>

to get a comparison, with a file of 1.2Gb. Moreover, in some cases I tested two or three times the same command, to assess the dispersion in the results. I did not compute averages and standard deviations, but the results are obvious.

This is what I got under the testing conditions specified above, with brief comments. I have concatenated in a single row the results of multiple tests, whenever available.

The base case:

$ time cp -p source/file1 target1/

real 0m0,846s 0m0,680s 0m0,659s user 0m0,000s 0m0,001s 0m0,016s sys 0m0,777s 0m0,662s 0m0,643s

The copying options:

  1. Option parallel

    $ parallel cp -p source/file1 ::: target1/
    real    0m0,745s    0m0,740s
    user    0m0,121s    0m0,108s
    sys     0m0,609s    0m0,619s
    

    $ parallel cp -p source/file1 ::: target1/ target2/ real 0m0,794s 0m0,860s user 0m0,116s 0m0,134s sys 0m1,300s 0m1,380s

  2. Option tee (appending > /dev/null to avoid output to stdout)

    $ tee target1/file1 < source/file1 > /dev/null
    real    0m0,874s    0m1,040s    0m1,028s
    user    0m0,160s    0m0,172s    0m0,137s
    sys     0m0,714s    0m0,868s    0m0,887s
    

    $ tee target1/file1 target2/file1 < source/file1 > /dev/null real 0m1,802s 0m1,680s 0m1,833s user 0m0,136s 0m0,212s 0m0,197s sys 0m1,642s 0m1,468s 0m1,619s

    Copying to two targets roughly doubles the time for one target, which is slightly larger than the time for the base case.

  3. Option xargs

    $ echo target1 | xargs -n 1 cp -p source/file1
    real    0m0,666s
    user    0m0,021s
    sys     0m0,646s
    

    $ echo target1 target2 | xargs -n 1 cp -p source/file1 real 0m1,197s user 0m0,018s sys 0m1,173s

    Copying to two targets roughly doubles the time for one target, which is similar to the time for the base case.

  4. Option find

    $ find target1 -exec cp -p source/file1 {} \;
    real    0m2,167s
    user    0m0,017s
    sys     0m1,627s
    

    $ find target1 target2 -exec cp -p source/file1 {} ; real 0m3,905s user 0m0,020s sys 0m3,185s

    Copying to two targets roughly doubles the time for one target, which is much larger than the time for the base case... a clear loser.

Sources for "multiple copying":

  1. https://www.cyberciti.biz/faq/linux-unix-copy-a-file-to-multiple-directories-using-cp-command/
  2. How to copy a file to multiple folders using the command line?
  3. https://stackoverflow.com/questions/195655/how-to-copy-a-file-to-multiple-directories-using-the-gnu-cp-command

Sources for performance cp vs. rsync:

  1. https://unix.stackexchange.com/questions/91382/rsync-is-very-slow-factor-8-to-10-compared-to-cp-on-copying-files-from-nfs-sha
  2. https://lwn.net/Articles/400489/
  3. https://superuser.com/questions/1170636/why-is-there-a-write-speed-difference-between-dd-cp-rsync-and-macos-finder-to
  4. What's the difference between ` cp ` and ` rsync `?
3

rsync has a batch mode you could experiment with. When you do an rsync --write-batch=foo from to it will do the usual copying, but also replicate the instructions and data into file foo. If instead of a file this is a fifo you can use a second rsync in parallel to read the fifo and do a new rsync to a different destination. Obviously, the new destination must be sufficiently similar to the original to make sense.

For example, over a network you might try

mkfifo myfifo
ssh remotec 'rsync -av --read-batch=- destc' <myfifo &
sleep 1
rsync -av --write-batch=myfifo srca/ remoteb:destb

--read-batch cannot be used with a remote:destc style destination.

meuh
  • 3,211