48

I'm getting these errors ramdomly, and I don't know if its normal or not.

[39441.061856] ata3.00: failed to read SCR 1 (Emask=0x40)
[39441.061866] ata3.01: failed to read SCR 1 (Emask=0x40)
[39441.061892] ata3.15: exception Emask 0x10 SAct 0x0 SErr 0x280100 action 0x6 frozen
[39441.061897] ata3.15: irq_stat 0x08000000, interface fatal error
[39441.061904] ata3.15: SError: { UnrecovData 10B8B BadCRC }
[39441.061910] ata3.00: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
[39441.061917] ata3.01: exception Emask 0x100 SAct 0xe SErr 0x0 action 0x6 frozen
[39441.061923] ata3.01: failed command: READ FPDMA QUEUED
[39441.061933] ata3.01: cmd 60/a8:08:b0:48:62/00:00:00:00:00/40 tag 1 ncq 86016 in
[39441.061940] ata3.01: status: { DRDY }
[39441.061944] ata3.01: failed command: READ FPDMA QUEUED
[39441.061953] ata3.01: cmd 60/a8:10:b0:49:62/00:00:00:00:00/40 tag 2 ncq 86016 in
[39441.061959] ata3.01: status: { DRDY }
[39441.061963] ata3.01: failed command: READ FPDMA QUEUED
[39441.061972] ata3.01: cmd 60/58:18:58:4a:62/00:00:00:00:00/40 tag 3 ncq 45056 in
[39441.061978] ata3.01: status: { DRDY }
[39441.061987] ata3.15: hard resetting link
[39441.608302] ata3.15: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[39441.609090] ata3.00: hard resetting link
[39441.929246] ata3.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[39441.929333] ata3.01: hard resetting link
[39442.249184] ata3.01: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[39442.263242] ata3.00: configured for UDMA/133
[39442.277570] ata3.01: configured for UDMA/133
[39442.277725] ata3: EH complete

I'm also pasting smartctl -a for sda, sdb and sdc.

Thanks in advance for your help.

ankit7540
  • 4,185
Marcos Junior
  • 883
  • 2
  • 9
  • 11

11 Answers11

23

While I essentially agree with Geppettvs D'Constanzo's answer, I would suggest that some of the first things you might also try are

  1. Checking that your SATA cable is securely attached and plugged into the sockets on the motherboard and hard drive.

  2. Replacing your SATA cable. SATA cables are (relatively) inexpensive and you do sometimes get a "bad" one. Often simply replacing the cable is the easiest way to diagnose and solve a problem like this.

(Although it is somewhat unexpected that two cables would both be bad at the same time. Still, it's an easy thing to check so in my opinion probably worth doing.)

I just saw you pastbins containing the SMART data for your drives. Notice the unexpectedly large number of CRC errors for drives sdb and sdc. I suggest you start by checking the cables and connections for those drives.

junior@mediacenter:/$ sudo  smartctl -a /dev/sda
...
Model Family:     SAMSUNG SpinPoint M7E (AFT)
Device Model:     SAMSUNG HM321HI
...
199 UDMA_CRC_Error_Count    0x0036   200   200   000   Old_age  Always -    0

junior@mediacenter:/$ sudo  smartctl -a /dev/sdb
...
Model Family:     SAMSUNG SpinPoint F4 EG (AFT)
Device Model:     SAMSUNG HD204UI
...
199 UDMA_CRC_Error_Count    0x0036   100   100   000   Old_age  Always  -  57

junior@mediacenter:/$ sudo  smartctl -a /dev/sdc
...
Model Family:     SAMSUNG SpinPoint F4 EG (AFT)
Device Model:     SAMSUNG HD204UI
...
199 UDMA_CRC_Error_Count    0x0036   100   100   000   Old_age  Always  - 398

OK. So not a latpop then. ;-)
Of course, if this is happening on a laptop than none of the above apply and I'm not sure what advice to offer. Maybe remove and re-install the hard drive? Perhaps it just needs to be re-seated in its socket to improve the connection?


sbd and sdc are connected on the same external e-sata cable (Thermaltake Duo HDD Dock). i'll replace my e-sata cable.

It could be due to a faulty or low quality cable. It could also be that the cable is somehow moved, bumped, or otherwise jostled while the drive is being used.

  • 1
    sbd and sdc are both connected on the same external e-sata cable (Thermaltake Duo HDD Dock). i'll replace my e-sata cable. – Marcos Junior May 09 '12 at 14:22
  • Indeed, same for me. One bad SATA cable in one HDD and one bad power cable in another. And this help-me a lot to locate the RAID disks: find -L /sys/bus/pci/devices/*/ata*/host*/target* -maxdepth 3 -name "sd*" 2>/dev/null | egrep block |egrep --colour '(ata[0-9]*)|(sd.*)' – Marcelo Scofano Diniz Jan 05 '21 at 22:46
13

It looks like you have a bad quality/damaged SATA Power/Data Cable. Which may be causing Bad CRC's. They aren't harmful at all and you can live with them but you are going to lose a lot of data soon.

The SMART report of your hard disk drives looks sane, so I am for power supply issues based on my experience when setting 5 hard disk drives in the same case/power source. I finished using an external power source (475W) for 2 drives and the case's 600W for all the case including GPU, optical and hard disk drives.

Anyway, I suggest you to run a full backup before you do anything else. If possible, clone your hard disk drive, after which you should check your cables and power source voltages.

Zanna
  • 70,465
  • Out of curiosity, was that GPU a big, honkin' power hungry GPU? – irrational John May 08 '12 at 20:55
  • 1
    nVidia Quadro 4000, not that hungry indeed. – Geppettvs D'Constanzo May 08 '12 at 20:59
  • 1
    Interesting. I have a 400w Antec (Neo-Eco) PSU, 5 hard drives, 2 optical drives, and an NVIDIA GeForce 9500 GT and I do not think I have had any power supply related problems. I do have drive CRC errors, but I think they are from stupid user errors I made a while back. (Bumping a cable & such.) I haven't noticed any warning logs in my kernel messages. Still, I guess I should keep a closer watch on it just to be safe. – irrational John May 08 '12 at 22:03
  • 1
    1xIDE DVD-RW, 1xSATA DVD-RW and 1xSATA Blu-Ray ROM Optical Drives this side. 4 SATA and 1 IDE HDD, GPU is 142 Watts power consumption. I can't say I am absolute sure that it was about power source issues but when I added the new Power Source the problems are gone. BTW, my drives seems to be healthy. But thank you for making me see that. Your opinion is really appreciated in this side. Thank you! – Geppettvs D'Constanzo May 08 '12 at 22:27
  • 1
    Uh, 142 watts for a GPU is ... something. My entire system (usually) uses less than that. As I type this my desktop box is pulling ~117 watts. (According to the Kill-A-Watt I had forgotten I still have it plugged into. ;-) – irrational John May 08 '12 at 22:40
  • sdb and sdc are both external hd's, connected on another power source. – Marcos Junior May 09 '12 at 14:28
  • You didn't mention that. But thank you for clarifying. Good luck! – Geppettvs D'Constanzo May 09 '12 at 15:33
  • 1
    @GeppettvsD'Constanzo This is a good answer, but I would "clone your hard disk drive" after replacing the cables and insuring appropriate voltage, not before. As the link is going up and down, it will take longer and be more difficult to get an accurate clone while the root cause of the problem still exists. Cheers! – Elder Geek Sep 06 '19 at 17:46
6

There seems to be a problem between some kernel versions ans some SATA controllers.

I have recently started to suffer a very similar problem (not sure if it is just the same) on a web server running Scientific Linux.

The most accurate and complete information I have found about such problem is this launchpad bug.

In short: Disabling NCQ seems to be the best workaround for users having this problem.

jap1968
  • 394
  • 3
    Disabling NCQ is a common workaround for buggy hardware. There does not appear to be a kernel bug. – psusi Dec 31 '12 at 15:57
  • Holy $#!+ that worked! All my error messages went away and my system stopped crashing! I entirely disagree with not a kernel bug, since I can use older kernel version (all the way back to at least 2.6 series) without any crashes. I can't believe I didn't find this sooner! – reukiodo Oct 20 '19 at 07:41
4

This error is unlikely to damage your hard drive but is highly likely to corrupt your filesystem(s). Begin by determining which drive is throwing the errors. This usually be determined easily by a number of approaches such as:

1) Issuing the command dmesg | grep ata3 and looking for the hard drive make and model. (as ata3 is the port throwing the error in your situation. Adjust accordingly) this will provide output similar to this:

dmesg | grep ata3
[    4.756081] ata3: SATA max UDMA/133 abar m2048@0xf7f26000 port 0xf7f26200 irq 135
[    5.071981] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    5.077850] ata3.00: HPA detected: current 1953523055, native 1953525168
[    5.077959] ata3.00: ATA-8: SAMSUNG HD103SJ, 1AJ10001, max UDMA/133
[    5.077960] ata3.00: 1953523055 sectors, multi 16: LBA48 NCQ (depth 32), AA
[    5.084057] ata3.00: configured for UDMA/133

A quick glance indicates that the drive connected to ata3 is the SAMSUNG HD103SJ

2) Issue the command below:

find -L /sys/bus/pci/devices/*/ata*/host*/target* -maxdepth 3 -name "sd*" 2>/dev/null | egrep block |egrep --colour '(ata[0-9]*)|(sd.*)'

This will provide both the ports and the device names highlighted on the same line as seen below:

output

It's easy to see that the device connected to ata3 has been assigned the device name sdb

3)install lsscsi with sudo apt install lsscsi and issue the command lsscsi

$ lsscsi
[0:0:0:0]    cd/dvd  ATAPI    iHAS124   F      CL9M  /dev/sr0 
[1:0:0:0]    disk    ATA      WDC WD2003FZEX-0 1A01  /dev/sda 
[2:0:0:0]    disk    ATA      SAMSUNG HD103SJ  0001  /dev/sdb 
[3:0:0:0]    disk    ATA      ST6000VN0033-2EE SC60  /dev/sdc 

Note that the first entry on each line above is the scsi_host, channel, target_number and LUN. It is placed in brackets and each element is colon separated. When there are multiple SCSI devices their entries are sorted in ascending order.

Simply adding 1 to the first number in each line of output gives you the ATA port. You can find more detail on lsscsi here and here.

Since in your case we are seeing errors thrown on both 3.00 and 3.01 you have more than one drive connected to the same ATA port. You are going toi want to carefully check connectivity to both ata3.00 and ata3.01 This could be a multi-bay drive enclosure connected to the same cable. Since both drives are throwing errors, replacing the cable to the aforementioned multi-drive bay should eliminate the problem for both drives. These devices usually have an external power source which also could be the culprit and need to be replaced, but the cable (being the weakest link) is by far the most likely root cause of the problem.

Sources:

Experience

https://linux.die.net/man/8/lsscsi

http://sg.danny.cz/scsi/lsscsi.html

https://serverfault.com/questions/244944/linux-ata-errors-translating-to-a-device-name/868943#868943

Elder Geek
  • 36,023
  • 25
  • 98
  • 183
3

Had the same issue - in my case this was due to 4-pin to SATA power adapter not being plugged snugly.

1

This is almost always a bad drive, I have thousands of drives that we use and although these errors never cause the drive to fail they have resulted in file system corruption. I think it really has to do with a problem with the controller board on the drive.

I have tried everything to solve this problem, the fix is to replace the drive and things work on the same cables and controllers.

Good luck

  • 4
    After over 30 years of troubleshooting these things for a living, I can assure you that in my experience this is almost always a dodgy cable. And since they are cheap you try that first. – Elder Geek Sep 06 '19 at 17:50
1

I know this thread is old, but just ran into the same issue, came here from google.

  • getting ata3.01: failed command: READ FPDMA QUEUED on booting the KUBUNTU 16.04 livecd.
  • WINDOWS 7 will behave erratically, works ok for a while, but freezes after watching youtube.

Changing the SATA cable didnt do anything.
Replaced the PSU and the problem went away.

Thomas
  • 6,223
0

I encountered the same problem. It was because of inadequate power supply to hdd. Changed the 4 pin power connector and it was up and running.

GunJack
  • 361
0

I know this thread is old but I recently encountered the same problem on a newly bought machine with 6 sata slots. I installed the cdrom and the hardrive on 2 sata slots close to each other and then proceeded to install ubuntu without errors until I reboot then on I saw the ata8: hard resetting link error. The machine halts up to this point never recovered. I tried rebooting for a few times and it did not work. And then I tried to replace the hardrive on one of the 4 available slots and it worked just fine without hassle.

ultrajohn
  • 163
  • You mean you changed the SATA port the hard drive was plugged into, right? Or do you mean replaced the entire hard drive with another? I think it's the former, but just double-checking – Xen2050 Nov 19 '17 at 23:57
  • It's the former. – ultrajohn Nov 21 '17 at 00:15
0

This error is dangerous and it can damage your HD.

To solve it:

  1. Replace the SATA cable.
  2. If the error persists, plug the SATA cable on other motherboard socket (the current socket could be oxidated).
  3. If the error persists, the problem should be on your power supply unit (PSU).

http://eliasoenal.com/2012/10/31/power-supply-failures-can-be-pretty-annoying-to-find/

josircg
  • 1,239
0

I had the same problem. I had tried everything but only on the j- micron port on my asus p5k I did not have the errors.

But when I put the drive to an other power supply it worked and the errors were gone. Then I put the drive back to its original power supply but a new power connector and that worked too.

sudodus
  • 46,324
  • 5
  • 88
  • 152