6

I have a Razer Blade Stealth 2016. The first Ubuntu I installed was Ubuntu 17.04, which gave e this error after 2 weeks of usage. After that, I installed 16.04 and used it for months without any problems, until it produced the same error today. I think it has to do with the ubuntu updates, because I did one recently and one today, just before this problem. Could be a coincidence though.

(I even did some stress tests, by downloading 100s of gb of data lots of times, and having my disk almost full, and I got none of these errors while in 16.04 without updates)

After I run fsck manually, it solves the problem but it happens again after some time.

UPDATE:

It happened again (with the 17.10.1 fresh install with no updates that I was using since the day I started this post. I noticed the problem because I tried to save one of my VMs into disk and it said that my disk was read only. Then I ran:

lz@lz:/var/log$ touch something
touch: cannot touch 'something': Read-only file system


lz@lz:/var/log$ cat syslog
Jan 29 01:07:39 lz kernel: [62984.375393] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0


lz@lz:/var/log$ dmesg
[62984.375393] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
[62984.377374] Aborting journal on device nvme0n1p2-8.
[62984.379343] EXT4-fs (nvme0n1p2): Remounting filesystem read-only
[62984.379516] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
[62984.381486] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
[62984.383484] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
[62984.385469] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
[62984.387278] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
[62984.389262] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
[62984.391252] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
[62984.393341] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
[63285.618078] audit: type=1400 audit(1517195560.393:63): apparmor="DENIED" operation="capable" profile="/usr/sbin/cupsd" pid=22495 comm="cupsd" capability=12  capname="net_admin"

I then rebooted and did fsck /dev/nvm.... It asked me about lots of inodes, I did 'yes' to all, and at a moment it stopped.

https://i.stack.imgur.com/GRu9x.jpg (this photo shows the entire output, but not much visible) https://i.stack.imgur.com/bEDZ5.jpg (this one is better but it cuts a bit of the output)

Here's a video of the entire process: https://photos.app.goo.gl/8ZHF3Un1BOsRwjaz1

I'm going to apply the microcode patch as the answer below suggest, but I don't think it has to do with the problem, as this is happening months before meltdown and spectre. And I had never even installed a microcode update.

I still think it has to do with the problem I described on my post. Can somebody give me details about if it was fixed, in which kernels it is fixed? What should I do?

Anyways, I just applied the suggested correction of adding

nvme_core.default_ps_max_latency_us=5500

to the boot parameters. Going to see how the syste behaves with it.

UPDATE: every time I install a new system, it behaves well until I decide to use the software updater! Then it enters in read-only mode :(

I tried with nvme_core.default_ps_max_latency_us=250 and it didn't work

UPDATE: everything seems to work FINE when I install windows. Even the benchmark tests says everything is ok

UPDATE (03/10/2019)

I still have this problem. It happens once a day, but happens a lot when there' s heavy ssd usage.

I tried with a brand new Samsung 960 EVO 2156gb SSD and the problem persists, so it' s not related to the SSD itself. However I made the mistake of buying one of the same brand and related model. I did not test non samsung SSDs.

Both SSDs run perfect on Windows. I even did LOTS of benchmark tests which stressed them a lot. No problems.

I tried ubuntu 16, 18, 19, Debian 10, Linux Mint. All give the same problem. Can somebody help me find the source of the problem? I spent a LOT on this computer and don' t have money to buy a new one.

My grub file now:

GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet"
GRUB_CMDLINE_LINUX=""
PPP
  • 113
  • 1
  • 2
  • 16
  • 2
    Please edit your question and include the specific errors/logs that relate to the disk corruption that you describe, as that will better help us to troubleshoot your issue. – richbl Jan 24 '18 at 04:16
  • @richbl I didn't know how to get the errors. It just said that my disk was read only, and then when I rebooted, it said that my disk needed a manual fsck, so I did fsck /dev/sda1 and recovered everything – PPP Jan 24 '18 at 05:05
  • 2
    As this happens at boot time, probably, if you can reproduce it, please take a picture with your phone (where all the error text is clearly visible) and post it here as image. Otherwise use copy/paste of the text. If fsckresolves the problem, please also post the fsckoutput (as picture or copy/paste), that we might see what fsckdid. – Robert Riedl Jan 24 '18 at 07:22
  • What is your SSD model number? What is the date of your SSD Firmware? – WinEunuuchs2Unix Jan 27 '18 at 03:31
  • Also what were the kernel versions before and after updates? Use: uname -r in terminal to find kernel version. – WinEunuuchs2Unix Jan 27 '18 at 03:40
  • @WinEunuuchs2Unix before: 4.13.0-21-generic. I had no chance to test it on the new yet, but I'm gonna inform when it happens again – PPP Jan 27 '18 at 04:22
  • @WinEunuuchs2Unix SAMSUNG MZVLW512HMJP-00000, FW REV (is this what you want?): CXY7501Q – PPP Jan 27 '18 at 04:23
  • @LucasZanella That Samsung SSD is a more commonly referred to as a PM961. There were many problems reported starting with Ubuntu Kernel update to 4.13.0-26-generic but this is the first time I've read about NVMe M.2 SSD's falling victim. – WinEunuuchs2Unix Jan 27 '18 at 04:31
  • @WinEunuuchs2Unix so it is indeed a problem with the updade, right? I'll hold and not update then. Maybe it's better to wait for a newer kernel. Where can I follow all this? – PPP Jan 27 '18 at 10:35
  • @WinEunuuchs2Unix also, the problems of this new kernel are related to the power features of the ssd or its a completely different problem? – PPP Jan 27 '18 at 10:40
  • Just updated my question with new info about this problem. Please read it people. – PPP Jan 29 '18 at 04:11
  • For anyone reading this in the future, it looks like this kernel bug is the issue: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1746340?comments=all – kir13y Aug 06 '18 at 06:30
  • WOW this has been going on for a long time... Can you update your question with contents of /etc/default/grub for the line starting with: GRUB_CMDLINE_LINUX_DEFAULT= ... ? Also note for others the bug report has about 180 posts and the Ubuntu employee who took it on eventually gave up on it: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1746340 – WinEunuuchs2Unix Oct 03 '19 at 23:45
  • @WinEunuuchs2Unix hi this is me (lucas zanella) from a friends account since I spent all my other firends account on the bounty and cant comment. So I updated with my grub commands you can see im not using any quirks right now. Also the post you posted from launchpad is actually mine. The ubuntu emplooyee provided multiple patches to me and none of them worked :( then he gave up – Paprika Oct 04 '19 at 00:25
  • Hi Lucas. First I'd get rid of the "quiet" parameter to see if they're any error messages popping up. Next look at my post on the launchpad bug report and try the grub parameters (the two important ones at least) I've been using for my Samsung 960 Pro for almost two years. – WinEunuuchs2Unix Oct 04 '19 at 01:41
  • Probably you can try add nvme_load=YES into GRUB_CMDLINE_LINUX_DEFAULT section so that the boot can load nvme_core process. – abu-ahmed al-khatiri Oct 04 '19 at 02:59

1 Answers1

2

Intel Microcode 2018-01-08 breaks some systems

When the world famous Meltdown and Spectre security holes were announced in the beginning of 2018 vendors rushed in with fixes. According to Ubuntu Intel asked them to downgrade to older microcode when then the January 8, 2018 Microcode Update broke some systems.


List your current Microcode version

To find your current Microcode version use:

$ apt list --installed | grep intel-microcode

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

intel-microcode/now 3.20170707.1~ubuntu16.04.0 amd64 [installed,upgradable to: 3.20180108.0+really20170707ubuntu16.04.1]

In my case the Intel Microcode update for 2018-01-08 is not being used and the original version from 2017-07-07 is being used. When patches for Meltdown were announced bugs started appearing on regular updates on 2018-01-04. Since then I declined all automatic updates in favour of manually installing new mainline kernels instead. That is why I have the older original microcode.


Downgrade Microcode for Ubuntu 14.04, 16.04 and 17.10

If you are running 2018-01-08 Intel Microcode you MUST upgrade it to the version released on 2018-01-22.

The problem can be corrected by updating your system to the following package version:

Ubuntu 17.10:

Ubuntu 16.04 LTS:

Ubuntu 14.04 LTS:

To update your system, please follow these instructions: https://wiki.ubuntu.com/Security/Upgrades.

After a standard system update you need to reboot your computer to make all the necessary changes.

Repeat the steps in previous section to check your Intel Microcode version

Install Microcode from Terminal

To install Microcode from Terminal without going through Ubuntu GUI Settings panels use:

sudo apt update
sudo apt install intel-microcode
  • No microcode appears in the output, as I didn't install third paty softwae. So I should just install the intel-microcode 3.20180108.0+really20170707ubuntu17.10.1? So this time the SSD problem is not related to the bug I listed, but actually to spectre and meltdown patches? – PPP Jan 27 '18 at 21:29
  • The Meltdown patches were applied in certain kernel versions. Some had bugs that were fixed in subsequent versions and some may not have been fixed yet for specific problems. For example kernel 4.13.0-26 introduced key repeat and touchpad problems that can be fixed. Use uname -r to document your kernel versions before and after upgrade. One option is to download 16.04.1 LTS with different kernel than 16.04.3 LTS and stay off of HWE. – WinEunuuchs2Unix Jan 27 '18 at 21:44
  • 1
    The error happened again. I'm going to reboot, run fsck and publish here its output. I don't think the problem is with microcode at all. As you said, the latest microcode update broke some things, but this problem is going on for months with me. It'd me an enormous coincidence if the same problem is ocurring now but due to a totally different source, I guess. I don't even have microcode on my computer. – PPP Jan 29 '18 at 03:45
  • The SSD power bug is marked as fixed for past ubuntus, but not for 17.10. Do you think it has any relation? Do you think I should apply a microcode patch also? – PPP Jan 29 '18 at 03:46
  • @LucasZanella At this point the microcode patch won't hurt and can only help. I'm running the patch with my Samsung Pro 960 which is slightly more advanced than your PM961. What kernel version are you running? – WinEunuuchs2Unix Jan 29 '18 at 03:48
  • Curretnyly I'm running 4.13.0-21-generic. Just rebooted and fixed the ssd. I have to do the microcode update manually? The ubuntu update isn't going to work? – PPP Jan 29 '18 at 04:00
  • @LucasZanella I added instructions for microcode installation via Terminal. If this doesn't fix current SSD problems then a different kernel might be in order. – WinEunuuchs2Unix Jan 29 '18 at 04:10
  • But you said that I should downgrade. Won't sudo apt install intel-microcode install the latest version? – PPP Jan 29 '18 at 04:14
  • The latest version is a downgrade to the 2017-7-7 version. It redacts the 2018-01-08 version. – WinEunuuchs2Unix Jan 29 '18 at 04:17
  • Ok, just installed. It says (3.20180108.0+really20170707ubuntu17.10.1). Also applied the suggested fix that I posted in my question. Couldn't wait to see if microcode alone would fix it because I have to use this computer a lot today. Let's see if it works now. – PPP Jan 29 '18 at 04:20
  • happened again :( – PPP Jan 30 '18 at 06:44
  • @LucasZanella Two things I can think of: 1) Date of firmware on SSD, is it most current? and 2) Try new kernel 14.14.15 which I've been using without problem – WinEunuuchs2Unix Jan 30 '18 at 10:58
  • How can I see the date of the firmware? Also, whenever I run the update software, the error happens! Always! – PPP Jan 30 '18 at 19:39
  • @LucasZanella The only way of getting the firmware date is from Magician Software which is only available in Windows unless it's an older drive (not yours). The only way of updating the firmware is in Windows. The update happens because of Kernel version 4.13.0-26 to 4.13.0-32 with Meltdown patches. I'm hoping it doesn't happen on with Kernel 4.14.15: https://askubuntu.com/questions/119080/how-to-update-kernel-to-the-latest-mainline-version-without-any-distro-upgrade. If it works fine with 4.14.15 then the kernel will stick at that version but all other software updates. – WinEunuuchs2Unix Jan 31 '18 at 00:38
  • Hi. I installed windows today, did some benchmark tests, everything seems ok. Downloaded magician software. It recognizes my SSD but says that a driver is not supported. In the samsung website there's no tutorial on how to upgrade the SSD using windows. It just says that I need to download unetbooting on windows and write the firmware image to my USB drive and plug into my computer. It'll actually run a linux system to update (however I didn't get it to boot correctly yet).Was it supposed to be like this?I also posted some data here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1746340 – PPP Feb 05 '18 at 00:46
  • @LucasZanella With my Samsung NVMe M.2 960 Pro I didn't have to install firmware because the website driver date was February 2017 and my drive was manufactured in June 2017 according to the box it came in. Therefore the drive would have had up-to-date firmware already. It's funny that to update the Window's driver you need to use Linux. I feel bad now recommending getting the SSD firmware up-to-date using Magician Windows software. But from all the googling I did the Linux version of Magician DC (Command Line Interface) didn't work with my Samsung Pro 960 or your Samsung PM961. – WinEunuuchs2Unix Feb 05 '18 at 00:52