2

I am running Ubuntu 20.04 and my system hangs about once every week or two. It crashed today and I saved my entire /var/log directory.

What information should I provide to support resolving this crash issue?

Is there any summary of what kinds of issues can cause a linux system to hang?

I have not touched the bios.

sudo snap list:
anna_user2@anna-XPS-8930:~$ sudo snap list
Name                  Version                     Rev    Tracking         Publisher     Notes
atom                  1.57.0                      282    latest/stable    snapcrafters  classic
bare                  1.0                         5      latest/stable    canonical✓    base
canonical-livepatch   10.1.2                      126    latest/stable    canonical✓    -
coq-prover            2021-09-0                   27     latest/stable    coq-team      -
core                  16-2.54.2                   12603  latest/stable    canonical✓    core
core18                20211215                    2284   latest/stable    canonical✓    base
core20                20220114                    1328   latest/stable    canonical✓    base
gnome-3-28-1804       3.28.0-19-g98f9e67.98f9e67  161    latest/stable    canonical✓    -
gnome-3-34-1804       0+git.3556cb3               77     latest/stable/…  canonical✓    -
gnome-3-38-2004       0+git.1f9014a               99     latest/stable    canonical✓    -
gtk-common-themes     0.1-59-g7bca6ae             1519   latest/stable/…  canonical✓    -
jq                    1.5+dfsg-1                  6      latest/stable    mvo           -
postman               7.36.5                      133    latest/stable    postman-inc✓  -
ruby                  3.1.0                       247    latest/stable    rubylang✓     classic
simplescreenrecorder  0.1                         1      latest/stable    xiaoguo       -
smplayer              21.10.0                     43     latest/stable    rvm           -
snap-store            3.38.0-66-gbd5b8f7          558    latest/stable/…  canonical✓    -
snapd                 2.54.2                      14549  latest/stable    canonical✓    snapd
anna_user2@anna-XPS-8930:~$ 
Soren A
  • 6,799
  • 2
    There are many reasons for crashes, so it is a process of diagnosing. If you know the crash time, have a look at the /var/log/syslog file for the crash time to see if there are any notifications at that time. – Jaydin Feb 08 '22 at 03:23
  • What was you doin before your system hangs? – Jon Feb 09 '22 at 13:17
  • There are many reasons for systems to hang. Could be a virus for example. – Jon Feb 09 '22 at 13:19
  • Could you add output from sudo snap list to the question? – ExploitFate Feb 09 '22 at 16:43
  • What kind of crash occurs? Completely black screen or freezing up of last display image? What does PC do after crashing? Automatically reboots? Did you try press Alt + right arrow to switch terminal screen (in case black screen)? – netbat Feb 09 '22 at 18:27
  • If you saved /var/log before a reboot (so from other OS) you can try to evaluate /var/log/dmesg - but the file is rebuilt on every regular reboot, so if you rebooted to save the logfiles, the file is from that reboot, not from the crash (But if your system crashes regularly, you can think of this next time ;-)

    Oh and - the summary of what kinds of issues can cause this: In the current state of information, "all" kinds of issues could be the cause - sorry...

    – cyberbrain Feb 11 '22 at 19:53
  • I don't understand the comment about saving var/log before reboot. Once it freezes, there is no opportunity to save /var/log until after reboot – Anna Naden Feb 14 '22 at 19:59
  • It doesn't auutomatically reboot – Anna Naden Feb 14 '22 at 19:59

2 Answers2

3

There can be various reason for the problem you are facing. Can you share the logs from /var/log/syslog or /var/log/kern.log and also check if your machine is having any error while the device is on by dmesg -T

Another thing can be your hardware. Maybe there are some hardware issues which may create issues like system crash, sudden hang, etc.

You can try to give more log report so that others can understand what's the issue exactly.

Option 1

Service Fail

There might be some services that are crashing/failing again, and again, which can be reason for system crash. You can check your services regarding the same.

Example:

systemctl list-units --type=service

Systemctl Here you can see all the services that are failed or active.

Option 2

Hardware Fail

There might be some hardware which are not working properly like: CPU fan, Display Graphics, RAM etc. You can check it on your Dmesg logs, it can be seen there if there is any hardware error as well as or erros like GPU hand or Segmentation fault etc.

Example: dmesg -T

[Fri Jan 21 18:02:27 2022] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=20339577 end=20339578) time 439 us, min 2146, max 2159, scanline start 2136, end 2196
[Fri Jan 21 18:03:29 2022] [drm] Got external EDID base block and 1 extension from "edid/edid.bin" for connector "DP-1"
[Fri Jan 21 18:03:29 2022] [drm] Got external EDID base block and 1 extension from "edid/edid.bin" for connector "DP-1"
[Fri Jan 21 18:37:28 2022] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=20465675 end=20465676) time 417 us, min 2146, max 2159, scanline start 2144, end 2200
[Fri Jan 21 19:03:59 2022] [drm] Got external EDID base block and 1 extension from "edid/edid.bin" for connector "DP-1"
[Fri Jan 21 19:04:00 2022] [drm] Got external EDID base block and 1 extension from "edid/edid.bin" for connector "DP-1"
[Fri Jan 21 20:41:15 2022] i915 0000:00:02.0: GPU HANG: ecode 9:4:0xc86dffef, in vlc [1478503], hang on vcs0
[Fri Jan 21 20:41:15 2022] i915 0000:00:02.0: Resetting vcs0 for hang on vcs0
[Fri Jan 21 21:03:04 2022] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=20989841 end=20989842) time 412 us, min 2146, max 2159, scanline start 2142, end 2198
[Fri Jan 21 22:13:11 2022] i915 0000:00:02.0: Resetting vcs0 for hang on vcs0
[Fri Jan 21 22:43:59 2022] i915 0000:00:02.0: Resetting vcs0 for hang on vcs0
[Fri Jan 21 23:49:47 2022] i915 0000:00:02.0: Resetting vcs0 for hang on vcs0
[Sat Jan 22 01:00:43 2022] i915 0000:00:02.0: Resetting vcs0 for hang on vcs0
[Sat Jan 22 03:39:16 2022] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=22416118 end=22416119) time 413 us, min 2146, max 2159, scanline start 2117, end 2173
[Sat Jan 22 06:05:57 2022] i915 0000:00:02.0: Resetting vcs0 for hang on vcs0
[Sat Jan 22 06:51:33 2022] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=23108341 end=23108342) time 408 us, min 2146, max 2159, scanline start 2141, end 2196
[Sat Jan 22 08:01:12 2022] i915 0000:00:02.0: Resetting vcs0 for hang on vcs0
[Sat Jan 22 08:36:01 2022] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=23484460 end=23484461) time 433 us, min 2146, max 2159, scanline start 2102, end 2160
[Sat Jan 22 10:06:46 2022] i915 0000:00:02.0: Resetting vcs0 for hang on vcs0
[Sat Jan 22 10:10:56 2022] i915 0000:00:02.0: Resetting vcs0 for hang on vcs0
[Sat Jan 22 11:44:18 2022] i915 0000:00:02.0: Resetting vcs0 for hang on vcs0

or harware error like

[Mon Feb 14 13:41:45 2022] evm: security.ima
[Mon Feb 14 13:41:45 2022] evm: security.capability
[Mon Feb 14 13:41:45 2022] evm: HMAC attrs: 0x1
[Mon Feb 14 13:41:45 2022] BERT: Error records from previous boot:
[Mon Feb 14 13:41:45 2022] [Hardware Error]: event severity: fatal
[Mon Feb 14 13:41:45 2022] [Hardware Error]:  Error 0, type: fatal
[Mon Feb 14 13:41:45 2022] [Hardware Error]:   section type: unknown, 81212a96-09ed-4996-9471-8d729c8e69ed
[Mon Feb 14 13:41:45 2022] [Hardware Error]:   section length: 0x290
[Mon Feb 14 13:41:45 2022] [Hardware Error]:   00000000: 00000001 00000000 00000000 00020002  ................
[Mon Feb 14 13:41:45 2022] [Hardware Error]:   00000010: 00020002 00000001 0000031d 00000000  ................
[Mon Feb 14 13:41:45 2022] [Hardware Error]:   00000020: 00000000 00000000 00000000 00000000  ................
[Mon Feb 14 13:41:45 2022] [Hardware Error]:   00000030: 00000000 00000000 00000000 00000000  ................
[Mon Feb 14 13:41:45 2022] [Hardware Error]:   00000040: 00000000 00000000 00000000 00000000  ................
[Mon Feb 14 13:41:45 2022] [Hardware Error]:   00000050: 00000000 00000000 00000000 00000000  ................
[Mon Feb 14 13:41:45 2022] [Hardware Error]:   00000060: 00000000 00000000 00000000 00000000  ................
[Mon Feb 14 13:41:45 2022] [Hardware Error]:   00000070: 00000000 00000000 00000000 00000000  ................
[Mon Feb 14 13:41:45 2022] [Hardware Error]:   00000080: 00000000 00000000 00000000 00000000  ................
[Mon Feb 14 13:41:45 2022] [Hardware Error]:   00000090: 0012cf23 00000000 00000002 00000001  #...............
[Mon Feb 14 13:41:45 2022] [Hardware Error]:   000000a0: 0000031d 00000000 00040000 000ffff8  ................
[Mon Feb 14 13:41:45 2022] [Hardware Error]:   000000b0: 000014a8 00000880 00000880 00000000  ................
[Mon Feb 14 13:41:45 2022] [Hardware Error]:   000000c0: 00000000 00000000 00000000 00000000  ................
[Mon Feb 14 13:41:45 2022] [Hardware Error]:   000000d0: 00000000 00000000 00000000 00000000  ................
[Mon Feb 14 13:41:45 2022] [Hardware Error]:   000000e0: 00000000 00000000 00000000 00000000  ................
[Mon Feb 14 13:41:45 2022] [Hardware Error]:   000000f0: 00000000 00000000 00000000 00000000  ................
[Mon Feb 14 13:41:45 2022] [Hardware Error]:   00000100: 00000000 00000000 00000000 00000000  ................
[Mon Feb 14 13:41:45 2022] [Hardware Error]:   00000110: 00000000 00000000 00000000 00000000  ................
[Mon Feb 14 13:41:45 2022] [Hardware Error]:   00000120: 00000000 00000000 00000000 00000000  ................
[Mon Feb 14 13:41:45 2022] [Hardware Error]:   00000130: 00000000 00000000 00000000 00000000  ................
[Mon Feb 14 13:41:45 2022] [Hardware Error]:   00000140: 00000000 00000000 00000000 00000000  ................
[Mon Feb 14 13:41:45 2022] [Hardware Error]:   00000150: 00000000 00000000 00000000 00000000  ................
[Mon Feb 14 13:41:45 2022] [Hardware Error]:   00000160: 00000000 00000000 00000000 00000000  ................
[Mon Feb 14 13:41:45 2022] [Hardware Error]:   00000170: 00000000 00000000 00000000 00000000  ................
[Mon Feb 14 13:41:45 2022] [Hardware Error]:   00000180: 00000000 00000000 00000000 00000000  ................
[Mon Feb 14 13:41:45 2022] [Hardware Error]:   00000190: 00000000 00000000 00000000 00000000  ................

Option 3

Reinstalling the OS

You can try to re-install the OS and check if you get the same issue. If 'Yes' then there might be a chance that there is a hardware issue.

  • The screen is not black - it shows everything - but it doesn't respond to keyboard or mouse. What is the best way to make the logs available to you folks? – Anna Naden Feb 14 '22 at 19:56
  • dmesg -T give several screenfuls of information. How can I make that available to you folks – Anna Naden Feb 14 '22 at 19:57
  • you can use pastebin to share a large file here. using the sharing link and pasting it to the original post or as a new answer. – Spark_TheCat Feb 14 '22 at 20:49
  • After a crash, would it be helpful to boot from a flash drive and capture the logs before rebooting the OS? – Anna Naden Feb 14 '22 at 23:06
  • I won't recommend that. You can boot your PC and upload all the logs from /var/log/kern.log * You can use past bin if needed. Or else try re-installing OS again. In most cases, it fixes the issue. – Amogh Saxena - REXTER Feb 15 '22 at 05:31
0

I used a command - I think it was systemd - to find out why it takes 10 minutes to reboot after the crash. It indicated the AI coding aid "kite." I removed kite and not only does my system boot promptly, it doesn't crash anymore.