1

I regularly get the following error on my Ubuntu Server 18.10:

00:00:30 systemd[1]: Starting Discard unused blocks...
00:00:30 systemd[1]: Starting Rotate log files...
00:00:30 systemd[1]: Started Rotate log files.
00:01:01 kernel: ata7.00: exception Emask 0x0 SAct 0x10000 SErr 0x0 action 0x6 frozen
00:01:01 kernel: ata7.00: failed command: SEND FPDMA QUEUED
00:01:01 kernel: ata7.00: cmd 64/01:80:00:00:00/00:00:00:00:00/a0 tag 16 ncq dma 512 out
                                           res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
                                           res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
00:01:01 kernel: ata7.00: status: { DRDY }
00:01:01 kernel: ata7: hard resetting link
00:01:01 kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
00:01:01 kernel: ata7.00: configured for UDMA/133
00:01:01 kernel: ata7.00: device reported invalid CHS sector 0
00:01:01 kernel: ata7: EH complete
00:01:32 kernel: ata7.00: exception Emask 0x0 SAct 0x40000 SErr 0x0 action 0x6 frozen
00:01:32 kernel: ata7.00: failed command: SEND FPDMA QUEUED
00:01:32 kernel: ata7.00: cmd 64/01:90:00:00:00/00:00:00:00:00/a0 tag 18 ncq dma 512 out
                                           res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
00:01:32 kernel: ata7.00: status: { DRDY }
00:01:32 kernel: ata7: hard resetting link
00:01:32 kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
00:01:32 kernel: ata7.00: configured for UDMA/133
00:01:32 kernel: ata7.00: device reported invalid CHS sector 0
00:01:32 kernel: ata7: EH complete
00:02:04 kernel: ata7.00: exception Emask 0x0 SAct 0x20 SErr 0x0 action 0x6 frozen
00:02:04 kernel: ata7.00: failed command: SEND FPDMA QUEUED
00:02:04 kernel: ata7.00: cmd 64/01:28:00:00:00/00:00:00:00:00/a0 tag 5 ncq dma 512 out
                                           res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
00:02:04 kernel: ata7.00: status: { DRDY }
00:02:04 kernel: ata7: hard resetting link
00:02:05 kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
00:02:05 kernel: ata7.00: configured for UDMA/133
00:02:05 kernel: ata7.00: device reported invalid CHS sector 0
00:02:05 kernel: ata7: EH complete
00:02:37 kernel: INFO: task fstrim:29514 blocked for more than 120 seconds.
00:02:37 kernel:       Tainted: P           O      4.18.0-17-generic #18-Ubuntu
00:02:37 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
00:02:37 kernel: fstrim          D    0 29514      1 0x00000000
00:02:37 kernel: Call Trace:
00:02:37 kernel:  __schedule+0x29e/0x840
00:02:37 kernel:  schedule+0x2c/0x80
00:02:37 kernel:  schedule_timeout+0x258/0x360
00:02:04 kernel: ata7.00: cmd 64/01:28:00:00:00/00:00:00:00:00/a0 tag 5 ncq dma 512 out
                                           res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
00:02:04 kernel: ata7.00: status: { DRDY }
00:02:04 kernel: ata7: hard resetting link
00:02:05 kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
00:02:05 kernel: ata7.00: configured for UDMA/133
00:02:05 kernel: ata7.00: device reported invalid CHS sector 0
00:02:05 kernel: ata7: EH complete
00:02:37 kernel: INFO: task fstrim:29514 blocked for more than 120 seconds.
00:02:37 kernel:       Tainted: P           O      4.18.0-17-generic #18-Ubuntu
00:02:37 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
00:02:37 kernel: fstrim          D    0 29514      1 0x00000000
00:02:37 kernel: Call Trace:
00:02:37 kernel:  __schedule+0x29e/0x840
00:02:37 kernel:  schedule+0x2c/0x80
00:02:37 kernel:  schedule_timeout+0x258/0x360
00:02:37 kernel:  io_schedule_timeout+0x1e/0x50
00:02:37 kernel:  wait_for_completion_io+0xb7/0x140
00:02:37 kernel:  ? wake_up_q+0x80/0x80
00:02:37 kernel:  submit_bio_wait+0x61/0x90
00:02:37 kernel:  blkdev_issue_discard+0x7a/0xd0
00:02:37 kernel:  ext4_trim_fs+0x5a9/0x8b0
00:02:37 kernel:  ? security_file_open+0x86/0x90
00:02:37 kernel:  ext4_ioctl+0xd81/0x14a0
00:02:37 kernel:  ? _copy_to_user+0x2b/0x40
00:02:37 kernel:  ? cp_new_stat+0x152/0x180
00:02:37 kernel:  do_vfs_ioctl+0xa8/0x620
00:02:37 kernel:  ? __do_sys_newfstat+0x5f/0x70
00:02:37 kernel:  ksys_ioctl+0x67/0x90
00:02:37 kernel:  __x64_sys_ioctl+0x1a/0x20
00:02:37 kernel:  do_syscall_64+0x5a/0x110
00:02:37 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
00:02:37 kernel: RIP: 0033:0x7faba5a9e3c7
00:02:37 kernel: Code: Bad RIP value.
00:02:37 kernel: RSP: 002b:00007ffec09ede88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
00:02:37 kernel: RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007faba5a9e3c7
00:02:37 kernel: RDX: 00007ffec09ede90 RSI: 00000000c0185879 RDI: 0000000000000004
00:02:37 kernel: RBP: 0000000000000004 R08: 0000000000000001 R09: 0000000000000000
00:02:37 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000561d21106dd0
00:02:37 kernel: R13: 00007faba5663ff8 R14: 00007ffec09edfc8 R15: 0000561d21106dd0
00:02:37 kernel: ata7.00: NCQ disabled due to excessive errors
00:02:37 kernel: ata7.00: exception Emask 0x0 SAct 0x1000000 SErr 0x0 action 0x6 frozen
00:02:37 kernel: ata7.00: failed command: SEND FPDMA QUEUED
00:02:37 kernel: ata7.00: cmd 64/01:c0:00:00:00/00:00:00:00:00/a0 tag 24 ncq dma 512 out
                                           res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
00:02:37 kernel: ata7.00: status: { DRDY }
00:02:37 kernel: ata7: hard resetting link
00:02:38 kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
00:02:38 kernel: ata7.00: configured for UDMA/133
00:02:38 kernel: ata7.00: device reported invalid CHS sector 0
00:02:38 kernel: ata7: EH complete
00:03:02 fstrim[29514]: /home/caillou/downloads: 891.5 GiB (957190782976 bytes) trimmed
00:03:02 fstrim[29514]: /: 212.4 GiB (228063428608 bytes) trimmed
00:03:02 systemd[1]: Started Discard unused blocks.

Unfortunately I don't understand what it tries to tell me.

  1. What is ata7.00?
  2. What does failed command: SEND FPDMA QUEUED mean? What is this FPDMA?
  3. What does device reported invalid CHS sector 0 mean?

I suspect it has something to do with a drive, but I have no idea how to debug nor how to fix this issue.

Here is the output of lsblk:

sda           8:0    0   7.3T  0 disk
|-sda1        8:1    0     2G  0 part
`-sda2        8:2    0   7.3T  0 part
sdb           8:16   0   7.3T  0 disk
|-sdb1        8:17   0     2G  0 part
`-sdb2        8:18   0   7.3T  0 part
sdc           8:32   0   7.3T  0 disk
|-sdc1        8:33   0     2G  0 part
`-sdc2        8:34   0   7.3T  0 part
sdd           8:48   0   7.3T  0 disk
|-sdd1        8:49   0     2G  0 part
`-sdd2        8:50   0   7.3T  0 part
sde           8:64   0   3.7T  0 disk
|-sde1        8:65   0     2G  0 part
`-sde2        8:66   0   3.7T  0 part
sdf           8:80   0   7.3T  0 disk
|-sdf1        8:81   0     2G  0 part
`-sdf2        8:82   0   7.3T  0 part
sdg           8:96   0 931.5G  0 disk
`-sdg1        8:97   0 931.5G  0 part /home/caillou/downloads
nvme0n1     259:0    0 232.9G  0 disk
nvme1n1     259:1    0 232.9G  0 disk
|-nvme1n1p1 259:2    0   512M  0 part /boot/efi
`-nvme1n1p2 259:3    0 232.4G  0 part /
  • sdg is an mSATA SSD connected through a PCIe card.
  • sda - sdf are SATA HDDs with ZFS.

Detail of the drives:

  • sda WD Red 8T, mainboard SATA connector.
  • sdb WD Red 8T, mainboard SATA connector.
  • sdc WD Red 8T, mainboard SATA connector.
  • sdd WD Red 8T, mainboard SATA connector.
  • sde WD Red 4T, mainboard SATA connector.
  • sdf WD Red 8T, mainboard SATA connector.
  • sdg Samsung mSATA 1T, shuked from Samsung Portable SSD T5, connected through a PCIe card.
  • nvme0n1 and nvme1n1 Samsung 970 EVO, connected to the mainboard m.2 connector.

The system does not show other signs of errors. Also, everything seems to function as intended, with the exception of these errors in the logs.

Pierre Spring
  • 189
  • 1
  • 7
  • As I do not really understand the problem, I find it difficult to correctly tag the question, or to choose an adequate title. Please edit to your liking, if you understand more than I do. – Pierre Spring May 01 '19 at 14:25
  • Is this a SSD or HDD? Is the ONLY problem seen in syslog, or does the system exhibit operational problems? Do you have a GUI on this server? Are you able to view SMART data using the Disks app (or smartctl)? 1. ata7.00 identifies which SATA drive, 2. FPDMA = First-Party Direct Memory Access, 3. CHS sector 0 means cylinder/head/sector... an outdated form of identifying a location on disk. I have a possible solution, pending your answers. – heynnema May 01 '19 at 14:54
  • @heynnema I edited the question with responses as good as I could. You say ata7.00 identifies the SATA drive, does that mean it is sdg, as g is the 7th letter? – Pierre Spring May 01 '19 at 15:11
  • ata7.00... I believe they start at ata1.00, which would be your first SATA drive, and that may or may not coincide with which physical SATA port it's plugged into on your motherboard... but yes, ata7.00 may be your sdg or nvme drive. Are all of your drive that same brand/make/model? What brand/make/model is the nvme? Please answer the other questions in my previous comment. – heynnema May 01 '19 at 15:22
  • @heynnema Is this a SSD or HDD? sdg is an SSD. Is the ONLY problem seen in syslog, or does the system exhibit operational problems? It is the only problem in the logs, no operational problems. Do you have a GUI on this server? No GUI on the server. Are all of your drive that same brand/make/model? What brand/make/model is the nvme? Updated the question with that info. – Pierre Spring May 01 '19 at 15:36
  • 1
    I've added a quick answer that should probably take care of the problem. Report back. – heynnema May 01 '19 at 16:31
  • 1
    Status please... – heynnema May 02 '19 at 15:36
  • @heynnema I am in the progress of figuring out how to apply all the firmware updates. It will provably take some time, as I am a noob. But don't worry, I'll get back to you as soon as there is some real progress. I am also resilvering the 4TB HDD, so things might take a bit longer. But thanks for the help. I am sure we'll figure it out. – Pierre Spring May 03 '19 at 19:09

1 Answers1

1

Note: It's a good idea to have good backups first.

You need to check/upgrade the firmware on the Samsung SSD's for sdg and nvme*.

Go to Samsung's download page here and download their Samsung Magician software tool to help with the firmware upgrade. Other software updates are also available there.

Also check for a firmware upgrade for the sdg PCIe card.

Check your motherboard BIOS with sudo dmidecode -s bios-version. Then go to the manufacturor's web site and check for a newer BIOS. If there is one, download and install it.

Note: later, if there are still problems, we'll discuss a ncq patch.

heynnema
  • 70,711