I regularly get the following error on my Ubuntu Server 18.10:
00:00:30 systemd[1]: Starting Discard unused blocks...
00:00:30 systemd[1]: Starting Rotate log files...
00:00:30 systemd[1]: Started Rotate log files.
00:01:01 kernel: ata7.00: exception Emask 0x0 SAct 0x10000 SErr 0x0 action 0x6 frozen
00:01:01 kernel: ata7.00: failed command: SEND FPDMA QUEUED
00:01:01 kernel: ata7.00: cmd 64/01:80:00:00:00/00:00:00:00:00/a0 tag 16 ncq dma 512 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
00:01:01 kernel: ata7.00: status: { DRDY }
00:01:01 kernel: ata7: hard resetting link
00:01:01 kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
00:01:01 kernel: ata7.00: configured for UDMA/133
00:01:01 kernel: ata7.00: device reported invalid CHS sector 0
00:01:01 kernel: ata7: EH complete
00:01:32 kernel: ata7.00: exception Emask 0x0 SAct 0x40000 SErr 0x0 action 0x6 frozen
00:01:32 kernel: ata7.00: failed command: SEND FPDMA QUEUED
00:01:32 kernel: ata7.00: cmd 64/01:90:00:00:00/00:00:00:00:00/a0 tag 18 ncq dma 512 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
00:01:32 kernel: ata7.00: status: { DRDY }
00:01:32 kernel: ata7: hard resetting link
00:01:32 kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
00:01:32 kernel: ata7.00: configured for UDMA/133
00:01:32 kernel: ata7.00: device reported invalid CHS sector 0
00:01:32 kernel: ata7: EH complete
00:02:04 kernel: ata7.00: exception Emask 0x0 SAct 0x20 SErr 0x0 action 0x6 frozen
00:02:04 kernel: ata7.00: failed command: SEND FPDMA QUEUED
00:02:04 kernel: ata7.00: cmd 64/01:28:00:00:00/00:00:00:00:00/a0 tag 5 ncq dma 512 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
00:02:04 kernel: ata7.00: status: { DRDY }
00:02:04 kernel: ata7: hard resetting link
00:02:05 kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
00:02:05 kernel: ata7.00: configured for UDMA/133
00:02:05 kernel: ata7.00: device reported invalid CHS sector 0
00:02:05 kernel: ata7: EH complete
00:02:37 kernel: INFO: task fstrim:29514 blocked for more than 120 seconds.
00:02:37 kernel: Tainted: P O 4.18.0-17-generic #18-Ubuntu
00:02:37 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
00:02:37 kernel: fstrim D 0 29514 1 0x00000000
00:02:37 kernel: Call Trace:
00:02:37 kernel: __schedule+0x29e/0x840
00:02:37 kernel: schedule+0x2c/0x80
00:02:37 kernel: schedule_timeout+0x258/0x360
00:02:04 kernel: ata7.00: cmd 64/01:28:00:00:00/00:00:00:00:00/a0 tag 5 ncq dma 512 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
00:02:04 kernel: ata7.00: status: { DRDY }
00:02:04 kernel: ata7: hard resetting link
00:02:05 kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
00:02:05 kernel: ata7.00: configured for UDMA/133
00:02:05 kernel: ata7.00: device reported invalid CHS sector 0
00:02:05 kernel: ata7: EH complete
00:02:37 kernel: INFO: task fstrim:29514 blocked for more than 120 seconds.
00:02:37 kernel: Tainted: P O 4.18.0-17-generic #18-Ubuntu
00:02:37 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
00:02:37 kernel: fstrim D 0 29514 1 0x00000000
00:02:37 kernel: Call Trace:
00:02:37 kernel: __schedule+0x29e/0x840
00:02:37 kernel: schedule+0x2c/0x80
00:02:37 kernel: schedule_timeout+0x258/0x360
00:02:37 kernel: io_schedule_timeout+0x1e/0x50
00:02:37 kernel: wait_for_completion_io+0xb7/0x140
00:02:37 kernel: ? wake_up_q+0x80/0x80
00:02:37 kernel: submit_bio_wait+0x61/0x90
00:02:37 kernel: blkdev_issue_discard+0x7a/0xd0
00:02:37 kernel: ext4_trim_fs+0x5a9/0x8b0
00:02:37 kernel: ? security_file_open+0x86/0x90
00:02:37 kernel: ext4_ioctl+0xd81/0x14a0
00:02:37 kernel: ? _copy_to_user+0x2b/0x40
00:02:37 kernel: ? cp_new_stat+0x152/0x180
00:02:37 kernel: do_vfs_ioctl+0xa8/0x620
00:02:37 kernel: ? __do_sys_newfstat+0x5f/0x70
00:02:37 kernel: ksys_ioctl+0x67/0x90
00:02:37 kernel: __x64_sys_ioctl+0x1a/0x20
00:02:37 kernel: do_syscall_64+0x5a/0x110
00:02:37 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
00:02:37 kernel: RIP: 0033:0x7faba5a9e3c7
00:02:37 kernel: Code: Bad RIP value.
00:02:37 kernel: RSP: 002b:00007ffec09ede88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
00:02:37 kernel: RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007faba5a9e3c7
00:02:37 kernel: RDX: 00007ffec09ede90 RSI: 00000000c0185879 RDI: 0000000000000004
00:02:37 kernel: RBP: 0000000000000004 R08: 0000000000000001 R09: 0000000000000000
00:02:37 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000561d21106dd0
00:02:37 kernel: R13: 00007faba5663ff8 R14: 00007ffec09edfc8 R15: 0000561d21106dd0
00:02:37 kernel: ata7.00: NCQ disabled due to excessive errors
00:02:37 kernel: ata7.00: exception Emask 0x0 SAct 0x1000000 SErr 0x0 action 0x6 frozen
00:02:37 kernel: ata7.00: failed command: SEND FPDMA QUEUED
00:02:37 kernel: ata7.00: cmd 64/01:c0:00:00:00/00:00:00:00:00/a0 tag 24 ncq dma 512 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
00:02:37 kernel: ata7.00: status: { DRDY }
00:02:37 kernel: ata7: hard resetting link
00:02:38 kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
00:02:38 kernel: ata7.00: configured for UDMA/133
00:02:38 kernel: ata7.00: device reported invalid CHS sector 0
00:02:38 kernel: ata7: EH complete
00:03:02 fstrim[29514]: /home/caillou/downloads: 891.5 GiB (957190782976 bytes) trimmed
00:03:02 fstrim[29514]: /: 212.4 GiB (228063428608 bytes) trimmed
00:03:02 systemd[1]: Started Discard unused blocks.
Unfortunately I don't understand what it tries to tell me.
- What is
ata7.00
? - What does
failed command: SEND FPDMA QUEUED
mean? What is thisFPDMA
? - What does
device reported invalid CHS sector 0
mean?
I suspect it has something to do with a drive, but I have no idea how to debug nor how to fix this issue.
Here is the output of lsblk
:
sda 8:0 0 7.3T 0 disk
|-sda1 8:1 0 2G 0 part
`-sda2 8:2 0 7.3T 0 part
sdb 8:16 0 7.3T 0 disk
|-sdb1 8:17 0 2G 0 part
`-sdb2 8:18 0 7.3T 0 part
sdc 8:32 0 7.3T 0 disk
|-sdc1 8:33 0 2G 0 part
`-sdc2 8:34 0 7.3T 0 part
sdd 8:48 0 7.3T 0 disk
|-sdd1 8:49 0 2G 0 part
`-sdd2 8:50 0 7.3T 0 part
sde 8:64 0 3.7T 0 disk
|-sde1 8:65 0 2G 0 part
`-sde2 8:66 0 3.7T 0 part
sdf 8:80 0 7.3T 0 disk
|-sdf1 8:81 0 2G 0 part
`-sdf2 8:82 0 7.3T 0 part
sdg 8:96 0 931.5G 0 disk
`-sdg1 8:97 0 931.5G 0 part /home/caillou/downloads
nvme0n1 259:0 0 232.9G 0 disk
nvme1n1 259:1 0 232.9G 0 disk
|-nvme1n1p1 259:2 0 512M 0 part /boot/efi
`-nvme1n1p2 259:3 0 232.4G 0 part /
sdg
is an mSATA SSD connected through a PCIe card.sda
-sdf
are SATA HDDs with ZFS.
Detail of the drives:
sda
WD Red 8T, mainboard SATA connector.sdb
WD Red 8T, mainboard SATA connector.sdc
WD Red 8T, mainboard SATA connector.sdd
WD Red 8T, mainboard SATA connector.sde
WD Red 4T, mainboard SATA connector.sdf
WD Red 8T, mainboard SATA connector.sdg
Samsung mSATA 1T, shuked from Samsung Portable SSD T5, connected through a PCIe card.nvme0n1
andnvme1n1
Samsung 970 EVO, connected to the mainboardm.2
connector.
The system does not show other signs of errors. Also, everything seems to function as intended, with the exception of these errors in the logs.
Disks
app (or smartctl)? 1. ata7.00 identifies which SATA drive, 2. FPDMA = First-Party Direct Memory Access, 3. CHS sector 0 means cylinder/head/sector... an outdated form of identifying a location on disk. I have a possible solution, pending your answers. – heynnema May 01 '19 at 14:54ata7.00
identifies the SATA drive, does that mean it issdg
, asg
is the 7th letter? – Pierre Spring May 01 '19 at 15:11Is this a SSD or HDD?
sdg is an SSD.Is the ONLY problem seen in syslog, or does the system exhibit operational problems?
It is the only problem in the logs, no operational problems.Do you have a GUI on this server?
No GUI on the server.Are all of your drive that same brand/make/model? What brand/make/model is the nvme?
Updated the question with that info. – Pierre Spring May 01 '19 at 15:36