3

We ordered a pre-built machine with the below specs and had Ubuntu 17.04 installed on the SSD. Ubuntu randomly froze at four different points (including while installing anaconda for Python). We decided to move to 16.04 (since we were more familiar with this OS anyhow from our day jobs) on the same SSD drive, and the problem persisted.

This is the output of the syslogs correlated with the time of one of the freezes.

Jun  5 02:22:08 PsertainTech org.gnome.evolution.dataserver.Sources5[1648]: ** (evolution-source-registry:1869): WARNING **: secret_service_search_sync: must specify at least one attribute to match
Jun  5 02:22:18 PsertainTech kernel: [ 3648.736597] INFO: rcu_sched detected stalls on CPUs/tasks:
Jun  5 02:22:18 PsertainTech kernel: [ 3648.736609]     7-...: (433 GPs behind) idle=07d/1/0 softirq=176166/176166 fqs=7188 
Jun  5 02:22:18 PsertainTech kernel: [ 3648.736613]     (detected by 3, t=15002 jiffies, g=241027, c=241026, q=3366)
Jun  5 02:22:18 PsertainTech kernel: [ 3648.736620] Task dump for CPU 7:
Jun  5 02:22:18 PsertainTech kernel: [ 3648.736622] swapper/7       R  running task        0     0      1 0x00000008
Jun  5 02:22:18 PsertainTech kernel: [ 3648.736628]  0000000000000010 0000000000000246 ffff8c86585b7e70 0000000000000018
Jun  5 02:22:18 PsertainTech kernel: [ 3648.736632]  7735940000000000 00000343907a93bf 0000000000000007 ffff8c86585b8000
Jun  5 02:22:18 PsertainTech kernel: [ 3648.736636]  ffff8c865f3e2900 ffffffffac4b92e0 ffff8c86585b4000 ffff8c86585b7eb8
Jun  5 02:22:18 PsertainTech kernel: [ 3648.736640] Call Trace:
Jun  5 02:22:18 PsertainTech kernel: [ 3648.736650]  [<ffffffffabd157f7>] ? cpuidle_enter+0x17/0x20
Jun  5 02:22:18 PsertainTech kernel: [ 3648.736655]  [<ffffffffab6c7a5a>] ? call_cpuidle+0x2a/0x50
Jun  5 02:22:18 PsertainTech kernel: [ 3648.736658]  [<ffffffffab6c7e3e>] ? cpu_startup_entry+0x29e/0x350
Jun  5 02:22:18 PsertainTech kernel: [ 3648.736662]  [<ffffffffab651891>] ? start_secondary+0x151/0x190
Jun  5 02:25:18 PsertainTech kernel: [ 3828.752786] INFO: rcu_sched detected stalls on CPUs/tasks:
Jun  5 02:25:18 PsertainTech kernel: [ 3828.752796]     7-...: (433 GPs behind) idle=07d/1/0 softirq=176166/176166 fqs=27789 
Jun  5 02:25:18 PsertainTech kernel: [ 3828.752799]     (detected by 4, t=60007 jiffies, g=241027, c=241026, q=11266)
Jun  5 02:25:18 PsertainTech kernel: [ 3828.752803] Task dump for CPU 7:
Jun  5 02:25:18 PsertainTech kernel: [ 3828.752805] swapper/7       R  running task        0     0      1 0x00000008
Jun  5 02:25:18 PsertainTech kernel: [ 3828.752810]  0000000000000010 0000000000000246 ffff8c86585b7e70 0000000000000018
Jun  5 02:25:18 PsertainTech kernel: [ 3828.752815]  7735940000000000 00000343907a93bf 0000000000000007 ffff8c86585b8000
Jun  5 02:25:18 PsertainTech kernel: [ 3828.752819]  ffff8c865f3e2900 ffffffffac4b92e0 ffff8c86585b4000 ffff8c86585b7eb8
Jun  5 02:25:18 PsertainTech kernel: [ 3828.752823] Call Trace:
Jun  5 02:25:18 PsertainTech kernel: [ 3828.752833]  [<ffffffffabd157f7>] ? cpuidle_enter+0x17/0x20
Jun  5 02:25:18 PsertainTech kernel: [ 3828.752839]  [<ffffffffab6c7a5a>] ? call_cpuidle+0x2a/0x50
Jun  5 02:25:18 PsertainTech kernel: [ 3828.752843]  [<ffffffffab6c7e3e>] ? cpu_startup_entry+0x29e/0x350
Jun  5 02:25:18 PsertainTech kernel: [ 3828.752848]  [<ffffffffab651891>] ? start_secondary+0x151/0x190
Jun  5 02:25:28 PsertainTech org.gnome.evolution.dataserver.Sources5[2720]: ** (evolution-source-registry:2988): WARNING **: secret_service_search_sync: must specify at least one attribute to match
Jun  5 02:28:18 PsertainTech kernel: [ 4008.773198] INFO: rcu_sched detected stalls on CPUs/tasks:
Jun  5 02:28:18 PsertainTech kernel: [ 4008.773210]     7-...: (433 GPs behind) idle=07d/1/0 softirq=176166/176166 fqs=48323 
Jun  5 02:28:18 PsertainTech kernel: [ 4008.773214]     (detected by 0, t=105012 jiffies, g=241027, c=241026, q=19082)
Jun  5 02:28:18 PsertainTech kernel: [ 4008.773221] Task dump for CPU 7:
Jun  5 02:28:18 PsertainTech kernel: [ 4008.773223] swapper/7       R  running task        0     0      1 0x00000008
Jun  5 02:28:18 PsertainTech kernel: [ 4008.773228]  0000000000000010 0000000000000246 ffff8c86585b7e70 0000000000000018
Jun  5 02:28:18 PsertainTech kernel: [ 4008.773232]  7735940000000000 00000343907a93bf 0000000000000007 ffff8c86585b8000
Jun  5 02:28:18 PsertainTech kernel: [ 4008.773236]  ffff8c865f3e2900 ffffffffac4b92e0 ffff8c86585b4000 ffff8c86585b7eb8
Jun  5 02:28:18 PsertainTech kernel: [ 4008.773240] Call Trace:
Jun  5 02:28:18 PsertainTech kernel: [ 4008.773251]  [<ffffffffabd157f7>] ? cpuidle_enter+0x17/0x20
Jun  5 02:28:18 PsertainTech kernel: [ 4008.773256]  [<ffffffffab6c7a5a>] ? call_cpuidle+0x2a/0x50
Jun  5 02:28:18 PsertainTech kernel: [ 4008.773259]  [<ffffffffab6c7e3e>] ? cpu_startup_entry+0x29e/0x350
Jun  5 02:28:18 PsertainTech kernel: [ 4008.773264]  [<ffffffffab651891>] ? start_secondary+0x151/0x190
Jun  5 02:31:18 PsertainTech kernel: [ 4188.789674] INFO: rcu_sched detected stalls on CPUs/tasks:
Jun  5 02:31:18 PsertainTech kernel: [ 4188.789686]     7-...: (433 GPs behind) idle=07d/1/0 softirq=176166/176166 fqs=68966 
Jun  5 02:31:18 PsertainTech kernel: [ 4188.789690]     (detected by 0, t=150017 jiffies, g=241027, c=241026, q=26836)
Jun  5 02:31:18 PsertainTech kernel: [ 4188.789696] Task dump for CPU 7:
Jun  5 02:31:18 PsertainTech kernel: [ 4188.789698] swapper/7       R  running task        0     0      1 0x00000008
Jun  5 02:31:18 PsertainTech kernel: [ 4188.789703]  0000000000000010 0000000000000246 ffff8c86585b7e70 0000000000000018
Jun  5 02:31:18 PsertainTech kernel: [ 4188.789707]  7735940000000000 00000343907a93bf 0000000000000007 ffff8c86585b8000
Jun  5 02:31:18 PsertainTech kernel: [ 4188.789710]  ffff8c865f3e2900 ffffffffac4b92e0 ffff8c86585b4000 ffff8c86585b7eb8
Jun  5 02:31:18 PsertainTech kernel: [ 4188.789715] Call Trace:
Jun  5 02:31:18 PsertainTech kernel: [ 4188.789725]  [<ffffffffabd157f7>] ? cpuidle_enter+0x17/0x20
Jun  5 02:31:18 PsertainTech kernel: [ 4188.789730]  [<ffffffffab6c7a5a>] ? call_cpuidle+0x2a/0x50
Jun  5 02:31:18 PsertainTech kernel: [ 4188.789733]  [<ffffffffab6c7e3e>] ? cpu_startup_entry+0x29e/0x350
Jun  5 02:31:18 PsertainTech kernel: [ 4188.789738]  [<ffffffffab651891>] ? start_secondary+0x151/0x190
Jun  5 02:34:18 PsertainTech kernel: [ 4368.808848] INFO: rcu_sched detected stalls on CPUs/tasks:
Jun  5 02:34:18 PsertainTech kernel: [ 4368.808861]     7-...: (433 GPs behind) idle=07d/1/0 softirq=176166/176166 fqs=89546 
Jun  5 02:34:18 PsertainTech kernel: [ 4368.808865]     (detected by 3, t=195022 jiffies, g=241027, c=241026, q=34852)
Jun  5 02:34:18 PsertainTech kernel: [ 4368.808871] Task dump for CPU 7:
Jun  5 02:34:18 PsertainTech kernel: [ 4368.808873] swapper/7       R  running task        0     0      1 0x00000008
Jun  5 02:34:18 PsertainTech kernel: [ 4368.808879]  0000000000000010 0000000000000246 ffff8c86585b7e70 0000000000000018
Jun  5 02:34:18 PsertainTech kernel: [ 4368.808883]  7735940000000000 00000343907a93bf 0000000000000007 ffff8c86585b8000
Jun  5 02:34:18 PsertainTech kernel: [ 4368.808887]  ffff8c865f3e2900 ffffffffac4b92e0 ffff8c86585b4000 ffff8c86585b7eb8
Jun  5 02:34:18 PsertainTech kernel: [ 4368.808891] Call Trace:
Jun  5 02:34:18 PsertainTech kernel: [ 4368.808901]  [<ffffffffabd157f7>] ? cpuidle_enter+0x17/0x20
Jun  5 02:34:18 PsertainTech kernel: [ 4368.808906]  [<ffffffffab6c7a5a>] ? call_cpuidle+0x2a/0x50
Jun  5 02:34:18 PsertainTech kernel: [ 4368.808909]  [<ffffffffab6c7e3e>] ? cpu_startup_entry+0x29e/0x350
Jun  5 02:34:18 PsertainTech kernel: [ 4368.808914]  [<ffffffffab651891>] ? start_secondary+0x151/0x190
Jun  5 02:37:18 PsertainTech kernel: [ 4548.828850] INFO: rcu_sched detected stalls on CPUs/tasks:
Jun  5 02:37:18 PsertainTech kernel: [ 4548.828862]     7-...: (433 GPs behind) idle=07d/1/0 softirq=176166/176166 fqs=110118 
Jun  5 02:37:18 PsertainTech kernel: [ 4548.828865]     (detected by 9, t=240027 jiffies, g=241027, c=241026, q=42598)
Jun  5 02:37:18 PsertainTech kernel: [ 4548.828871] Task dump for CPU 7:
Jun  5 02:37:18 PsertainTech kernel: [ 4548.828873] swapper/7       R  running task        0     0      1 0x00000008
Jun  5 02:37:18 PsertainTech kernel: [ 4548.828878]  0000000000000010 0000000000000246 ffff8c86585b7e70 0000000000000018
Jun  5 02:37:18 PsertainTech kernel: [ 4548.828882]  7735940000000000 00000343907a93bf 0000000000000007 ffff8c86585b8000
Jun  5 02:37:18 PsertainTech kernel: [ 4548.828886]  ffff8c865f3e2900 ffffffffac4b92e0 ffff8c86585b4000 ffff8c86585b7eb8
Jun  5 02:37:18 PsertainTech kernel: [ 4548.828889] Call Trace:
Jun  5 02:37:18 PsertainTech kernel: [ 4548.828900]  [<ffffffffabd157f7>] ? cpuidle_enter+0x17/0x20
Jun  5 02:37:18 PsertainTech kernel: [ 4548.828904]  [<ffffffffab6c7a5a>] ? call_cpuidle+0x2a/0x50
Jun  5 02:37:18 PsertainTech kernel: [ 4548.828907]  [<ffffffffab6c7e3e>] ? cpu_startup_entry+0x29e/0x350
Jun  5 02:37:18 PsertainTech kernel: [ 4548.828912]  [<ffffffffab651891>] ? start_secondary+0x151/0x190
Jun  5 02:40:18 PsertainTech kernel: [ 4728.843394] INFO: rcu_sched detected stalls on CPUs/tasks:
Jun  5 02:40:18 PsertainTech kernel: [ 4728.843406]     7-...: (433 GPs behind) idle=07d/1/0 softirq=176166/176166 fqs=130672 
Jun  5 02:40:18 PsertainTech kernel: [ 4728.843410]     (detected by 3, t=285032 jiffies, g=241027, c=241026, q=50334)

We unsuccessfully tried the c-state fix: GRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_idle.max_cstate=1"

  • Mobo: Asus Intel X99-A II USB 3.1 Series DDR4/ Quad CrossFireX&Quad SLI/ SATA3 & USB3.1
  • RAM: 64GB DDR4-2133/2400 PC4-17000/19200 4X16GB
  • CPU: Core i7-6850K Broadwell-E 6xCore 3.6GHz 2011 v3
  • SSD: Samsung 850 EVO Series 500GB Solid State Drive
  • HDD: 2TB Western Digital Black 7200RPM SATA-3 6Gb/s 64MB Cache
  • GPU: Nvidia GeForce GT 730 2GB DDR PCI-Express

free -h output:

              total        used        free      shared  buff/cache   available
Mem:            62G        2.1G         59G         55M        1.4G         60G
Swap:           63G          0B         63G

swapon output:

NAME TYPE SIZE USED PRIO /dev/sdb5 partition 63.9G 0B -1

sudo blkid output:

/dev/sda1: UUID="a8fb3b82-1a85-4377-99e3-20d22e63a451" TYPE="swap" PARTUUID="7f08bb24-9185-47bf-a949-a316a0b63f5b"
/dev/sdb1: UUID="1c22f4a6-4c23-46c1-bdcc-7644ed39e193" TYPE="ext4" PARTUUID="43ee50b9-01"
/dev/sdb5: UUID="c7a7c6b7-e3a2-4942-b4a1-7d66eccb0915" TYPE="swap" PARTUUID="43ee50b9-05"
/dev/sdb6: UUID="8668e4db-4dd9-4b76-8d06-b6241d521800" TYPE="ext4" PARTUUID="43ee50b9-06"

cat /etc/fstab output:

# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
# / was on /dev/sdb1 during installation
UUID=1c22f4a6-4c23-46c1-bdcc-7644ed39e193 /               ext4    errors=remount-ro 0       1
# /home was on /dev/sdb6 during installation
UUID=8668e4db-4dd9-4b76-8d06-b6241d521800 /home           ext4    defaults        0       2
# swap was on /dev/sdb5 during installation
UUID=c7a7c6b7-e3a2-4942-b4a1-7d66eccb0915 none            swap    sw              0       0

cat /etc/crypt* output: cat: '/etc/crypt*': No such file or directory

ls -alh /swapfile output: ls: cannot access '/swapfile': No such file or directory

Jun  8 08:29:22 psertain kernel: [37593.535883] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
Jun  8 08:29:22 psertain kernel: [37593.535898] nouveau 0000:01:00.0: fifo: runlist 0: scheduled for recovery
Jun  8 08:29:22 psertain kernel: [37593.535911] nouveau 0000:01:00.0: fifo: channel 3: killed
Jun  8 08:29:22 psertain kernel: [37593.535918] nouveau 0000:01:00.0: fifo: engine 0: scheduled for recovery
Jun  8 08:29:22 psertain kernel: [37593.535998] nouveau 0000:01:00.0: Xorg[1105]: channel 3 killed!
Jun  8 08:29:33 psertain kernel: [37604.030229] [drm:drm_atomic_helper_swap_state [drm_kms_helper]] *ERROR* [CRTC:37:head-0] hw_done timed out
Jun  8 08:29:43 psertain kernel: [37614.269426] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:37:head-0] hw_done timed out
Jun  8 08:29:53 psertain kernel: [37624.508634] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:37:head-0] flip_done timed out
Jun  8 08:30:59 psertain rsyslogd: [origin software="rsyslogd" swVersion="8.16.0" x-pid="909" x-info="http://www.rsyslog.com"] start
karel
  • 114,770
  • Nothing specific comes to mind, but I'd try latest kernel – Hi-Angel Jun 06 '17 at 18:32
  • Please edit your question to include the terminal output of free -h and swapon and sudo blkid and cat /etc/fstab and cat /etc/crypt* and ls -alh /swapfile and I'll take a look. Start comments directed to me with @heynnema or I may miss them. – heynnema Jun 06 '17 at 21:13
  • Also go to the Samsung web site and see if there's a firmware update available for your SSD. Download the Samsung Magician. Lastly, what version of Nvidia drivers are you running? From the Ubuntu repo or the Nvidia web site? – heynnema Jun 06 '17 at 21:17
  • Also, is intel-microcode installed? dpkg -l intel-microcode | grep ii. Do you have the latest ASUS BIOS installed? Sorry for all of the questions, but I'm trying to narrow it all down for you. – heynnema Jun 06 '17 at 21:45
  • Update: After updating the kernel (which I believe may have limited the random crashes - so thanks) and messing around with a few other settings, I've narrowed the issue down to one commonality - the freeze occurs when the monitor goes into power-saving mode (i.e. via suspend), lightdm crashes. I can press ctrl+alt+f1 and restart lightdm to rescue (and no longer need to restart), but graphical applications are killed off, thus interrupting my work. – sousuffer Jun 08 '17 at 00:36
  • I have disabled suspend and the monitor turning off (which solved the issue), but I'd like to restore functionality (I'd rather not have the screen on 24 hours/day if possible). My gut is something with the video card. When I got to "Software and Updates" -> "Additional Drivers", it is using X.Org X Server - Nouveau. When I tried selecting an Nvidia driver (I don't recall which version, but it was propreitary, tested), it brought me to a login loop (login, press enter, baack to login screen) at ultra-low resolution, so I had to switch back to the original driver. – sousuffer Jun 08 '17 at 00:36
  • output for dpkg -l intel-microcode | grep ii: ii intel-microcode 3.20151106.1 amd64 Processor microcode firmware for Intel CPUs – sousuffer Jun 08 '17 at 00:36
  • I've reviewed some of your updates. You've got a way too large swap file!!! Bring that down to around 8G. Is swap on the HDD? It should be. Also, try the Nvidia driver from the Ubuntu repositories. Did you check for ASUS BIOS updates, and SSD firmware updates? Remember, to send a comment that "pings" somebody, put @heynnema or whatever username, otherwise that person might miss the updates. – heynnema Jun 08 '17 at 01:24
  • ps: I just noticed that you've got TWO swaps! Your swap should be 8G and on the HDD, not the SSD. – heynnema Jun 08 '17 at 01:29
  • Thanks for all the comments - so I cannot find SSD firmware updates, but I tried switching to a different NVidia binary driver (version 370.28 (open source)). I still cannot resume from suspend and am still getting the occasional freeze (which still seems related to the nouveau driver - I will edit my original comment with the NEW log. – sousuffer Jun 08 '17 at 15:37
  • I don't under stand why the computer is still using nouveau if I switched off of it – sousuffer Jun 08 '17 at 15:38
  • 3
    So it turns out, I had to reinstall Linux and enable the 340 version of the nvidia driver rather than than the newer tested version and so far no freezes for 24 hours. Coming out of standby also works. Hopefully this continues. – sousuffer Jun 09 '17 at 14:41
  • 2

1 Answers1

1

So it turns out, I had to reinstall Linux and enable the 340 version of the nvidia driver rather than than the newer tested version and so far no freezes for 24 hours. Coming out of standby also works. Hopefully this continues. – @sousuffer

kenorb
  • 10,347