rsync --info=progress2 percent complete of copying a large directory is non-uniform in that the last 10% seems to take longer than the first 90%.
Why is this and is there a way to make it a more uniform progress indicator?
rsync --info=progress2 percent complete of copying a large directory is non-uniform in that the last 10% seems to take longer than the first 90%.
Why is this and is there a way to make it a more uniform progress indicator?
The reason for the observed behaviour is likely a file system cache:
When files are written (as rsync
does), then usually the data gets written to a cache (in memory) first and the write operation almost instantly returns. The data is then written to the disk in background while the user can already do other things.
If the cache is large enough to hold the data to be written this pretends a huge writing speed.
If the data to be written doesn't fit into the file system cache, then the excess data is actually written to disk before the write operation completes, and writing to disk is slower than writing to in-memory cache.
The excess data doesn't bypass the cache but rather waits until previous content has been moved from the cache to the disk giving again some free space in the cache, so the new data can be written to the cache.
So the first part of data (90% in your case) appears to be written in an instant (to cache) while the last 10% take more time because then actual disk operation kicks in.
In addition to PerlDuck's answer, it's worth noting that writing one large file is faster than writing a huge number of files that add up to the same size.
For example: You copy a 4gb file and a directory containing 100000 files that add up to 1gb. If the single file is transfered first, the first 80% will be way faster than the last 20%.
The percentage shown is only the percentage of the total size of the files rsync has already scanned. In the output from --info=progress2
, for instance:
71,256,901,358 99% 36.30MB/s 0:31:12 (xfr#173389, ir-chk=1000/361047)
the last number, 361047, is the number of files scanned so far. When you recursively copy a large directory with lots of subdirectories and files, this number will typically keep growing until the operation is almost complete, and the files that have already been scanned but haven’t been copied yet will typically only be a small fraction of the total number of files, so unless an unusually large file has been scanned but not copied, most of the data in the files that have been scanned has already been copied, and thus the percentage will typically be above 90% most of the time.
--no-inc-recursive
So if you really want the percentage to be accurate from the start, this option will do that, at the cost of a delayed start of copying files & some more memory used by rsync.
– Jamie Flournoy Oct 04 '23 at 16:49