0

When I use rsync to copy a 1TB dir from one USB drive to another USB drive the system goes haywire. The system load climbs up rapidly and processes trying to write to the destination drive go into the top app D state. The rsync copying hangs. Even an ls on that drive locks up.

I tried running rsync with --bwlimit and reducing the bandwidth just slows up the system load rise rate accordingly and even really slow rates like 1MB/sec don't help. Other processes strangely still work smoothly. Stopping rsync doesn't help and the only way to recover is to reboot. After rebooting I try the same copy with cp and everything runs smoothly so there is probably no problem with the drive.

I don't know if the drives are on the same internal hub. I'm on a Ubuntu 18.04.5 LTS server. Can anyone help?

EDIT: After 100GB of copying with cp, the cp process went into the top D state. The load isn't rising but it is stuck at 3.0 so I will need to reboot. Before the CP process ran the load was at 0.1. So something is still loading the system. Could the problem be in the drive? This drive has run ok with a lot of activity before this problem and is still ok.

EDIT2: Here are the kernel messages for the USB lockup resulting from rsync.

INFO: task usb-storage:285 blocked for more than 120 seconds. Tainted: G W 4.15.0-197-generic #208-Ubuntu "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. usb-storage D 0 285 2 0x80000000 Call Trace: __schedule+0x24e/0x890 schedule+0x2c/0x80 schedule_timeout+0x1cf/0x370

Can anyone decipher that and suggest how I can avoid the lockup? Right now I am incapable of cloning a drive.

EDIT3: Even ddrescue died after two hours with the same error. This happens with different drives and cables. Either my hp laptop has bad usb hardware or there is a bug in the ubuntu usb driver (very unlikely). Unless someone has an idea of something else to try I give up. I'm getting a different job.

EDIT4: After much pain I'm pretty sure I know all I can know. I improved my diagnostication skills with iotop, top, & dmesg to fully understand when it happens. It happens with cp and rsync equally.

I found the source drive has many mysterious "bad spots". I couldn't find these with fsck, badblocks, or ddrescue so they aren't normal "bad blocks". I am making progress by painfully letting it proceed until it fails and then rebooting. Luckily the drive I'm copying is a large collection of files and not something like a db.

mark-hahn
  • 101
  • Check your kernel logs if you see any errors related to the drive(s). – Freddy Mar 05 '23 at 21:18
  • do you mean dmesg? – mark-hahn Mar 05 '23 at 22:41
  • Yes. Have you found anything? – Freddy Mar 05 '23 at 22:44
  • I have to wait for something to finish to test again. However, I think my EDIT2 about problems with cp was an error in my testing. I was seeing D in top and I incorrectly thought it was permanent. D was showing up almost all the time because the USB operation was almost always waiting. So the process was not hung. – mark-hahn Mar 05 '23 at 23:45
  • I tried rsync again but to a different destination drive. See EDIT2. – mark-hahn Mar 06 '23 at 18:07
  • I suspect this isn’t the issue, but just to check - what file system are you using on the destination drive and what’s the biggest file you’re copying over? FAT32 has a maximum file size of 4GB. – Will Mar 08 '23 at 07:05
  • What computer is it? And which architecture of Ubuntu (i386 or amd64)? I have seen problems with rsync and USB copying huge sets of data running in i386 (32-bit) systems. In my particular case there was a problem with many files, but it worked to copy a tar archive of all those files. -- Interesting that it worked for you with cp. – sudodus Mar 08 '23 at 10:05
  • everything is ext4 and it's a 64-bit system – mark-hahn Mar 10 '23 at 07:24
  • I have used ddrescue with a log-file repeatedly to recover bad drives, both HDDs and DVDs. I am surprised that you cannot 'find' the bad spots on the drive with ddrescue. Have you checked your RAM with memtest86 or (the newest version of ) memtest86+? – sudodus Mar 10 '23 at 09:56

0 Answers0