0

I reinstalled a fresh Ubuntu 16.04.02 last week on an i7 Sandy Bridge Nvidia + Intel (Optimus graphics) laptop which was previously installed with the same specs whithout any problem.

Since then, I'm experiencing random system crashes while writing emails, editing photos, etc. with Nvidia GPU enabled or disabled (no pattern here). The system just stop working, no error message, no inputs, no console available, the display is frozen and the CPU heating more and more (guessing from the fan RPM) until I shut down manually the computer.

Removing all the Nvidia packages seems to resolve the issue, so I suspect Nvidia drivers to be responsible for this. In /var/log/syslog I have this line which appears a lot of times :

nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000857d:0:0:0x00000033

I run the nvidia-367.57 driver from Ubuntu repos, the xserver-xorg-hwe-16.04 stack and the linux-generic-hwe-16.04 kernel (linux-4.8.0.39.10). It's the same with nvidia-375 and even worse with nvidia-378 drivers. But again, as it is not really repeatable, it could be just bad luck.

Here are the last few lines of the syslog before a crash :

Feb 23 10:51:02 ouranos anacron[1277]: Job `cron.weekly' started
Feb 23 10:51:02 ouranos anacron[3472]: Updated timestamp for job `cron.weekly' to 2017-02-23
Feb 23 10:56:02 ouranos systemd[1]: Starting Cleanup of Temporary Directories...
Feb 23 10:56:02 ouranos systemd-tmpfiles[3506]: [/usr/lib/tmpfiles.d/var.conf:14] Duplicate line for path "/var/log", ignoring.
Feb 23 10:56:04 ouranos systemd[1]: Started Cleanup of Temporary Directories.
Feb 23 10:56:22 ouranos com.canonical.Unity.Scope.Applications[2356]: Error loading package indexes: Couldn't stat '/var/cache/software-center/xapian'
Feb 23 10:56:22 ouranos com.canonical.Unity.Scope.Applications[2356]: (unity-scope-loader:3525): unity-applications-daemon-CRITICAL **: daemon.vala:144: Failed to load Software Center index. 'Apps Available for Download' will not be listed
Feb 23 10:56:25 ouranos gnome-session[2531]: Gtk-Message: GtkDialog mapped without a transient parent. This is discouraged.
Feb 23 11:02:29 ouranos anacron[1277]: Job `cron.weekly' terminated
Feb 23 11:02:29 ouranos anacron[1277]: Normal exit (1 job run)
Feb 23 11:06:25 ouranos thermald[1355]: sysfs write failed trip_point_0_temp
Feb 23 11:06:29 ouranos thermald[1355]: sysfs write failed trip_point_0_temp
Feb 23 11:06:36 ouranos systemd[1]: Started CUPS Scheduler.
Feb 23 11:06:37 ouranos thermald[1355]: sysfs write failed trip_point_0_temp

And another one :

Feb 23 14:05:00 ouranos gnome-session[7432]: Done!
Feb 23 14:05:13 ouranos thermald[1350]: sysfs write failed trip_point_0_temp
Feb 23 14:05:16 ouranos bluetoothd[1317]: Endpoint unregistered: sender=:1.254 path=/MediaEndpoint/A2DPSource
Feb 23 14:05:16 ouranos bluetoothd[1317]: Endpoint unregistered: sender=:1.254 path=/MediaEndpoint/A2DPSink
Feb 23 14:05:19 ouranos org.gnome.zeitgeist.Engine[7259]: ** (zeitgeist-datahub:8084): WARNING **: zeitgeist-datahub.vala:229: Unable to get name "org.gnome.zeitgeist.datahub" on the bus!
Feb 23 14:05:21 ouranos thermald[1350]: sysfs write failed trip_point_0_temp
Feb 23 14:05:29 ouranos gnome-session[7432]: ** (zeitgeist-datahub:8064): WARNING **: zeitgeist-datahub.vala:212: Error during inserting events: GDBus.Error:org.gnome.zeitgeist.EngineError.InvalidArgument: Incomplete event: interpretation, manifestation and actor are required
Feb 23 14:05:29 ouranos gnome-session[7432]: [2017-02-23T19:05:29] [ERR] hddtemp : échec de l'ouverture de la connexion.
Feb 23 14:05:29 ouranos gnome-session[7432]: [2017-02-23T19:05:29] [ERR] atasmart : échec de sk_disk_open() : /dev/sda.
Feb 23 14:05:29 ouranos gnome-session[7432]: [2017-02-23T19:05:29] [ERR] atasmart : échec de sk_disk_open() : /dev/sdb.
\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00

(Note : /dev/sda is the local HDD and /dev/sdb is an external USB HDD).

How can I find a trace of what caused the crash ? Is the nvidia-modeset error something I should worry about ?

Since my CPU is a Sandy Bridge generation, the Baytrail bug affecting Pstate is most likely not the cause of the problem.

Aurélien Pierre
  • 204
  • 4
  • 18

2 Answers2

0

I had a similar experience as well. My PC would shut down after using the web browser or play games for a little bit. It turns out the graphics card i had was faulty (it lasted about 5 years before it did this) so i switched it out and my PC doesn't crash anymore. I don't know if this is your case but if you have an extra graphics card try switching it out to see.

0

After some research, it appeared to be a duplicate of these errors :

And related to this bug : https://bugzilla.kernel.org/show_bug.cgi?id=109051

But this solution (How to install Kernel 4.8.5 | askubuntu.com) did not solve the problem. Moreover, my CPU is not a Baytrail but a Sandy Bridge.

First, the log suggested a recurrent problem with thermald:

Feb 23 11:06:25 ouranos thermald[1355]: sysfs write failed trip_point_0_temp

I had to update it with a corrected version from the repo ubuntu-proposed. This issue is referenced here : https://answers.launchpad.net/ubuntu/+source/thermald/+question/293480

Then something looked wrong from Prime Indicator Plus, which enabled a "Nvidia Power Management" even when prime-select was on Intel. I've never heard from this option but syslogs showed some strange bugs involving power management on Nvidia.

So I removed Prime Indicator Plus.

I believe that a rather agressive configuration of TLP power management tool caused some glitches too, so I removed it as well.

Finally I removed, purged and reinstalled Nvidia driver which, as asual, has been my main source of strange bugs and crashes since I made the error to buy a Optimus dual GPU laptop.

Now it looks ok.

Aurélien Pierre
  • 204
  • 4
  • 18