26

I have a Dell XPS 15 9550. I've been running Ubuntu 16.10 on it for four months with no dramas.

Two days ago, I upgraded to Ubuntu 17.04. About an hour after upgrading, my hard-drive remounted into read-only mode. When I jumped to a tty screen, this appeared:

[ 746.341551] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #525023: comm NetworkManager: reading directory iblock 0
[ 746.343318] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #524289: comm pool: reading directory iblock 0
[ 746.356125] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #11272213: comm systemd-udevd: reading directory iblock 0
[ 746.356139] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #11272210: comm systemd-udevd: reading directory iblock 0
[ 746.356332] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #11272193: comm systemd-udevd: reading directory iblock 0
[ 746.356338] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #11272825: comm systemd-udevd: reading directory iblock 0
[ 746.356400] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #11272210: comm systemd-udevd: reading directory iblock 0
[ 746.474632] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #524539: comm unity-settings-: reading directory iblock 0
[ 746.992814] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #5506108: comm BrowserBlocking: reading directory iblock 0
[ 746.304451] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #5506117: comm BrowserBlocking: reading directory iblock 0

Here's what fdisk -l shows:

Disk /dev/nvme0n1: 477 GiB, 512110190592 bytes, 1000215216 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 3CD27380-DAC8-48DC-910A-D084CE857DA3

Device             Start        End   Sectors   Size Type
/dev/nvme0n1p1      2048    1026047   1024000   500M EFI System
/dev/nvme0n1p2   1026048    1288191    262144   128M Microsoft reserved
/dev/nvme0n1p3   1288192  487948287 486660096 232.1G Microsoft basic data
/dev/nvme0n1p4 972302336  973223935    921600   450M Windows recovery environmen
/dev/nvme0n1p5 973223936  998094847  24870912  11.9G Windows recovery environmen
/dev/nvme0n1p6 998094848 1000204287   2109440     1G Windows recovery environmen
/dev/nvme0n1p7 487948288  939046911 451098624 215.1G Linux filesystem
/dev/nvme0n1p8 939046912  972302335  33255424  15.9G Linux swap

Partition table entries are not in disk order.

I rebooted, and continued to get the error around once an hour. So I reinstalled Ubuntu 17.04 from scratch. However I am still getting the same issue.

I tried running fsck by creating a /forcefsck file (I created a wrapper shell script that adds the -v flag and outputs stdout to a file). Here's the result:

fsck.fat 4.0 (2016-05-06)                               
Checking we can access the last sector of the filesystem
Boot sector contents:                                   
System ID "MSDOS5.0"                                    
Media byte 0xf8 (hard disk)                             
       512 bytes per logical sector                     
      4096 bytes per cluster                            
      6206 reserved sectors                             
First FAT starts at byte 3177472 (sector 6206)          
         2 FATs, 32 bit entries                         
    508416 bytes per FAT (= 993 sectors)                
Root directory start at cluster 2 (arbitrary size)      
Data area starts at byte 4194304 (sector 8192)          
    126976 data clusters (520093696 bytes)              
63 sectors/track, 255 heads                             
      2048 hidden sectors                               
   1024000 sectors total                                
Reclaiming unconnected clusters.                        
Checking free cluster summary.                          
/dev/nvme0n1p1: 212 files, 15526/126976 clusters    

I tried booting from a live USB and running e2fsck -p /dev/nvme0n1p7 as suggested here (https://askubuntu.com/a/768813/679041). It didn't give any errors.

I also tried to run smartctl -t long /dev/nvme0n1p7 however the results seem to indicate that the tool doesn't work with my particular SSD:

smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.10.0-19-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       PM951 NVMe SAMSUNG 512GB
Serial Number:                      S29PNX0H611013
Firmware Version:                   BXV77D0Q
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Controller ID:                      1
Number of Namespaces:               1
Namespace 1 Size/Capacity:          512,110,190,592 [512 GB]
Namespace 1 Utilization:            254,982,533,120 [254 GB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Mon Apr 17 17:45:48 2017 AEST
Firmware Updates (0x06):            3 Slots
Optional Admin Commands (0x0017):   Security Format Frmw_DL *Other*
Optional NVM Commands (0x001f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Maximum Data Transfer Size:         32 Pages

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     6.00W       -        -    0  0  0  0        5       5
 1 +     4.20W       -        -    1  1  1  1       30      30
 2 +     3.10W       -        -    2  2  2  2      100     100
 3 -   0.0700W       -        -    3  3  3  3      500    5000
 4 -   0.0050W       -        -    4  4  4  4     2000   22000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
Read NVMe SMART/Health Information failed: NVMe Status 0x2002

Any idea of why this issue might be occuring and how I might solve it? Thanks! :)

Ben B
  • 681
  • 1
    Welcome to AskUbuntu! It looks like you may be affected by this bug I recommend that you let the devs know that this bug also effects you and subcribe to the bug so that you can be notified of progress/resolution. – Elder Geek Apr 17 '17 at 14:51
  • 1
    I'm having the exact same problem on a Lenovo Thinkpad X270 with a Toshiba SSD "THNSF5256GPUK TOSHIBA". I guess it's good to know I'm not the only one. – Maeher Apr 18 '17 at 02:41
  • @ElderGeek reading the linked bug report, it seems that until the issue is fixed, a temporary fix would be to disable APST, however from the discussion there it is unclear to me how to do that. It seems like a way to do so would be a valid answer to this question. – Maeher Apr 18 '17 at 03:34
  • Thanks for your comments guys :) Impatiently, I reinstalled again last night, however this time I explicitly formatted /dev/nvme0n1p7 and deleted /dev/nvme0n1p8 beforehand (I thought perhaps a reinstall with all the default options might not actually format, and instead only delete old files before installing new ones). Am yet to experience the issue after 4 hours of uninterrupted use however only time will tell. You'll hear my sobs across the pacific if I do :) – Ben B Apr 18 '17 at 03:50
  • OK I can confirm - I just got the issue again despite completely formatting the partition. Will add comment to bug linked above – Ben B Apr 18 '17 at 05:31
  • @Maeher That's certainly possible, however I can't seem to find the kernel option parameter to disable it. Perhaps you can. For a complete list of all known options, please see the file Documentation/kernel-parameters.txt in the kernel source tree and the individual architecture-specific documentation files. :-) – Elder Geek Apr 18 '17 at 13:18
  • I just want to report that I have the same machine and I experience the same bug, but with Ubuntu 16.04 LTS with kernel 4.8.0.48.20 (I have the hardware enablement stacks) I think they backported the bug from 4.10 because it started happening just one or two kernel updates ago (well after the update from 4.4 to 4.8)

    For me the workaround is just using older kernels, but that's not advisable in 17.04 I guess...

    – justmyfault Apr 18 '17 at 15:55
  • VTR (Vote to Reopen). This problem effects many users including one today that confirms this Q&A's accepted answer solves: https://askubuntu.com/questions/1018685/having-a-bunch-of-issues-with-ubuntu-16-04-on-wd-black-nvme-drive?noredirect=1#comment1653680_1018685 Because there are numerous times we recommend changing kernel parameters in grub for suspend/resume, graphics cards, or whatever to work properly, this question should be treated similarly. – WinEunuuchs2Unix Mar 24 '18 at 22:44

5 Answers5

30

As pointed out in a comment by Elder Geek, this is due to a known bug.

From the bug report:

APST support just landed in the latest Zesty kernel (4.10.0-14.16) as part of https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1664602. That patch has a quirk for certain 256GB Samsung drives found in Dell laptops that do not behave well when APST is enabled. I am experiencing the same symptoms with the same model laptop except with a 512GB Samsung. Prior to manually disabling APST the drive would die and system would go down in flames with I/O errors within 20 to 40 minutes of boot.

Until a proper fix is implemented, a workaround is suggested, which involves adding a kernel parameter:

Please try nvme_core.default_ps_max_latency_us=5500, if the issue persists, please try nvme_core.default_ps_max_latency_us=200.

To add a kernel boot parameter, edit the configuration file for GRUB:

sudo nano /etc/default/grub

Find the line beginning GRUB_CMDLINE_LINUX_DEFAULT and add the boot parameter to the others already between the quotes. For example, in this case you will probably end up with

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nvme_core.default_ps_max_latency_us=5500"

Save the file and exit, then to make the change effective, run

sudo update-grub 
Elder Geek
  • 36,023
  • 25
  • 98
  • 183
Ben B
  • 681
  • 1
    Is this fix working for you? BTW a link on how to set kernel parameters for who might stumble on your answer https://wiki.ubuntu.com/Kernel/KernelBootParameters – justmyfault Apr 18 '17 at 15:59
  • 1
    I am running Ubuntu 16.04 and I have been upgrading packages piecewise to zesty, something I wouldn't recommend to anyone but doing out of necessity. The last package was libc, something so integral to the system that if something would go wrong it would be while upgrading libc. On reboot, I saw all of the EXT4 errors mentioned in the question above, but adding the kernel parameter finally allowed me to reboot in peace and continue. Thank you. – lukecampbell Oct 04 '17 at 12:33
  • 1
    any updates on this? I'm suffering this problem on my razer blade stealth with a samsung 512gb ssd – PPP Jan 22 '18 at 00:01
  • The above workaround worked for me, but the bug has been fixed in package linux - 4.10.0-22.24. If you are still having issues you should open up a new bug report on launchpad. – Ben B Jan 25 '18 at 01:28
  • I tried both values, but it still crashed. nvme_core.default_ps_max_latency_us=0 worked for me. Kernel 4.15.0-36-generic Ubuntu 16.04 – Mike Schroll Sep 28 '18 at 06:24
  • I am trying with nvme_core.default_ps_max_latency_us=5500 on 4.15.0-122-generic Ubuntu 18.04.5 LTS. – Vanja D. Nov 06 '20 at 20:32
2

First, I'd visit the Samsung support web site and assure that you've got the latest firmware installed for your model SSD.

Then, your fsck didn't make a whole lot of sense, so do it this way...

To check the file system on your Ubuntu partition...

  • boot to the GRUB menu
  • choose Advanced Options
  • choose Recovery mode
  • choose Root access
  • at the # prompt, type sudo fsck -f /
  • repeat the fsck command if there were errors
  • type reboot
heynnema
  • 70,711
  • 1
    Thanks for your response! I've reinstalled, but this time I explicitly formatted the problem partition first (in case the default reinstall process didn't actually format). Hopefully it's OK now, however if the issue persists I'll run an fsck and post the results (though I would say if the problem persists on a freshly formatted partition, it might be beyond fsck's capabilities) – Ben B Apr 18 '17 at 03:58
  • The issue occurred again, however as pointed out by Elder Geek in the comments below my question, it seems to be due to a known bug (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184). – Ben B Apr 18 '17 at 05:46
  • @BenB did you ever check the firmware in your Samsung SSD, as I had suggested earlier? Depending on the model, they had some very mandatory updates to make the drive work right. – heynnema Apr 18 '17 at 12:36
  • I'm not actually 100% sure how to do this. I found some firmware here however I am not 100% certain any of those apply to my particular SSD. The bug report doesn't point to any firmware-related problems anyway, so at this point I'd rather wait for more info from the devs tackling the bug before trying to upgrade the firmware (knowing me, I'd do it wrong and lose all my stuff :P). – Ben B Apr 19 '17 at 04:32
  • The Samsung Magician Software shown on the link that you gave is an excellent way to check your firmware. Your model # and firmware version are shown in your SMART report. When you run fsck does it show any errors? – heynnema Apr 19 '17 at 14:19
  • 1
    fsck shows no errors. The problem isn't any sort of firmware issue or SSD corruption. It's due to APST, which has been enabled in 17.04. Setting the kernel parameter 'nvme_core.default_ps_max_latency_us=5500' has fixed the issue for me, and others have reported that disabling APST altogether fixes it for them. – Ben B Apr 21 '17 at 04:29
1

One possible workaround to the known bug I mentioned, which I'm unable to test, as I don't have the NVMe hardware in question, would be to try booting the current mainline daily kernel build package for your architecture available here.

Wait! before you dash off to try this, I must stress that unless you are certain you know exactly what you are doing and how to recover from unintended consequences it's highly recommended to backup.

If you don't know what you are doing and do have a current backup you can find more information on building your kernel here.

Note: In case you skimmed the first sentence, this answer is based on research, not testing. If it breaks, restore your backup.

Elder Geek
  • 36,023
  • 25
  • 98
  • 183
0

every time I open my laptop and try this command line. I have done this several times. and now my ext4-fs-error problem is solved.

sudo nano /etc/default/grub

and change GRUB_CMDLINE_LINUX_DEFAULT= to

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nvme_core.default_ps_max_latency_us=5500"

then save and exit. and write this command

sudo update-grub 

thanks to Ben B, for 1st comment. I get this command line from his answer (EXT4-fs error after Ubuntu 17.04 upgrade) .

0

I observed a similar error on a Lenovo ThinkPad P570 and after a fresh installation of Ubuntu 20.04 and Ubuntu 21.04.

It could be resolved by switching off the "UEFI Secure Boot" setting in the BIOS which initially was set on.