7

Ubuntu 16.04 has been hanging on me ~1x per day. This happens when I am in the middle of web browsing or using a desktop application, not when booting. When it does, the mouse pointer will still move freely, but clicking or keystrokes have no effect on my system until I do a hard reboot.

What is the best way for me to debug this?

Here is some information:

selah@selah-Precision-Tower-5810:~$ uname -a
Linux selah-Precision-Tower-5810 4.4.0-59-generic #80-Ubuntu SMP Fri Jan 6 17:47:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Also, in case it is relevant, I have a "very big" monitor, a Dell 42" at 3840x2160 resolution.

selah@selah-Precision-Tower-5810:~$ lspci | grep VGA
03:00.0 VGA compatible controller: NVIDIA Corporation GM107GL [Quadro K2200] (rev a2)

UPDATE:

Following Artyom's advice I found the following message in my error logs:

Apr 27 09:47:25 selah-Precision-Tower-5810 kernel: nouveau 0000:03:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
Apr 27 09:47:29 selah-Precision-Tower-5810 kernel: nouveau 0000:03:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
Apr 27 09:47:33 selah-Precision-Tower-5810 kernel: nouveau 0000:03:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]

Which has let me to this bug which describes similar behavior: https://bugs.freedesktop.org/show_bug.cgi?id=93629

kenorb
  • 10,347
Selah
  • 2,905
  • What do you mean 'hang' what comes to mind is a windows 95 computer in a noose. – Ubuntu User Jan 20 '17 at 21:14
  • 1
    I did in fact question whether I should use 'hang' or 'freeze'. After checking this out I chose 'hang'. http://www.cyberlink.com/support/faq-content.do?id=14592 – Selah Jan 20 '17 at 21:35
  • 2
    You need Nvidia proprietary drivers for better performance. –  Jan 21 '17 at 00:14
  • Trying this using these instructions: http://www.webupd8.org/2016/06/how-to-install-latest-nvidia-drivers-in.html – Selah Feb 15 '17 at 17:01
  • Ahhhh nvidea-375 messed up Ubuntu 16.04 real bad!! Lead to much crashing and mysterious blinking. Had to use recovery mode from previous kernel version in order to uninstall. – Selah Feb 15 '17 at 21:46
  • Tried a few other nvidia drivers, some results. Seems nvidia drivers and 16.04 are not compatible right now? – Selah Feb 17 '17 at 15:41

2 Answers2

4

This is the bug of Nouveau video driver (kernel extension). For details, check the bugs at bugs.freedesktop.org or at GitLab, especially: #93629, #99900 and #100567 (which are related to SCHED_ERROR/CTXSW_TIMEOUT).

To debug the freeze, you can use Magic SysRq key, for example:

Note: Consider holding ⇧ Shift (depending on your keyboard).

  • Alt-SysRq-9 (no ⇧ Shift) - Set the console log level to 9 to show more of kernel messages
  • Alt-SysRq-w - Display list of blocked (D state) tasks
  • Alt-SysRq-l - Shows a stack backtrace for all active CPUs.
  • Alt-SysRq-t - Output a list of current tasks and their information to the console
  • Alt-SysRq-p - Output the current registers and flags to the console
  • Alt-SysRq-q - Display all active high-resolution timers and clock sources.
  • Alt-SysRq-m - Output current memory information to the console

Other things to try during freeze:

Note: Consider holding ⇧ Shift (depending on your keyboard).

  • Reset the nice level of all high-priority and real-time tasks by hitting Alt-SysRq-n.
  • Try forcing a return to a text console by hitting Control-Alt-F1 (from F1 to F12).
  • Kill all processes on the current virtual console (can kill X) by hitting Alt-SysRq-k.
  • Perform a system crash (if it is configured) by Alt-SysRq-c.

If nothing works, you should perform a safe reboot by Alt-SysRq-REISUB, which is:

  • Alt-SysRq-R: UnRaw (take control of keyboard back from X).
  • Alt-SysRq-E: tErminate (send SIGTERM to all processes).
  • Alt-SysRq-I: kIll (send SIGKILL to all processes, forcing them to terminate immediately).
  • Alt-SysRq-S: Sync all mounted filesystems (flush data to disk).
  • Alt-SysRq-U: Unmount (remount all filesystems in read-only mode),
  • Alt-SysRq-B: immediately reBoot the system.

    Note: If above hard reboot combination won't work, the freeze could be caused by defected hardware, not video drivers.

Note: If some SysRq options doesn't work, due to "This sysrq operation is disabled" error, enable by:

echo 1 | sudo tee /proc/sys/kernel/sysrq

See: Configuring SysRq in Linux.


After reboot, check your kern.log for details, especially call traces generated by above kernel commands. This can help to find the right bug report for it, and find the solution. Check the following kern.log example.

You can check the latest crash log by:

journalctl -b -1 # Then hit Shift-G to jump to the end.

Suggested solution:

  • Upgrade your Ubuntu and kernel to the latest version.
  • If problem repeats, the workaround is to install NVIDIA drivers, which replaces Nouveau video driver.
  • If same happens with NVIDIA drivers, this can be related to the hardware issue or graphic card overheating (try lowering your overclocking features).
kenorb
  • 10,347
  • Intel NUC laptop built in Intel & geforce video card, mouse & trackpad & keyboard & all usb stop working soon after login, Ubuntu 18.04. Quite awkward for debugging. dmesg -Hw used to catch error messages before hang. Flood of nouveau fifo: SCHED_ERROR messages and noveau "DRM: failed to idle channel". Googling found this question and other info that hinted that switching from noveau driver to NVIDIA driver might help. Solution was to switch to use NVIDIA driver (Ubuntu - Software & Updates - Additional Drivers - chose between several NVIDIA driver versions or the nouveau driver). – gaoithe Dec 22 '20 at 17:10
2

Enable persistent logging

sudo mkdir /var/log/journal

Reboot

Make sure persistent logging is enabled by browsing /var/log/journal and checking if a random named directory exists.

After the incident

List system boots

sudo journalctl --list-boots

Extract the boot with the incident

sudo journalctl -b caf0524a1d394ce0bdbcff75b94444fe > /tmp/errorlog

or just

sudo journalctl -b caf0524a1d394ce0bdbcff75b94444fe

Inspect the log.

Zanna
  • 70,465
Artyom
  • 1,723