5

I've been experiencing for a while now some unwanted reboots for which the cause is not clear. At first when the computer was purchased, about a year ago, everything was stable.

Checking the reboot log, the previous session is reported as running

$ ~ ❯ last reboot                                                   05:14:52
reboot   system boot  5.15.0-50-generi Sat Oct 15 05:11   still running
reboot   system boot  5.15.0-50-generi Wed Oct 12 13:21   still running
reboot   system boot  5.15.0-48-generi Mon Oct 10 19:53 - 13:21 (1+17:27)
reboot   system boot  5.15.0-48-generi Thu Oct  6 08:08 - 13:21 (6+05:13)
reboot   system boot  5.15.0-47-generi Thu Sep 29 10:13 - 13:21 (13+03:07)

While a further look into the utmp file records a crash.

$ /var/log ❯ last -f wtmp                                                                                                                                                    andrea   tty2         tty2             Sat Oct 15 05:11   still logged in
reboot   system boot  5.15.0-50-generi Sat Oct 15 05:11   still running
andrea   tty2         tty2             Wed Oct 12 13:22 - crash (2+15:49)
andrea   tty2         tty2             Wed Oct 12 13:21 - 13:21  (00:00)
reboot   system boot  5.15.0-50-generi Wed Oct 12 13:21   still running
andrea   tty2         tty2             Mon Oct 10 19:53 - down  (1+17:27)
reboot   system boot  5.15.0-48-generi Mon Oct 10 19:53 - 13:21 (1+17:27)
andrea   tty2         tty2             Thu Oct  6 08:08 - crash (4+11:45)
reboot   system boot  5.15.0-48-generi Thu Oct  6 08:08 - 13:21 (6+05:13)
andrea   tty2         tty2             Thu Sep 29 10:19 - crash (6+21:48)
andrea   tty2         tty2             Thu Sep 29 10:17 - 10:19  (00:01)
reboot   system boot  5.15.0-47-generi Thu Sep 29 10:13 - 13:21 (13+03:07)
andrea   tty2         tty2             Thu Sep 29 09:33 - down   (00:40)
reboot   system boot  5.15.0-47-generi Thu Sep 29 09:29 - 10:13  (00:44)
andrea   tty2         tty2             Thu Aug 25 09:40 - down  (34+23:48)
reboot   system boot  5.15.0-46-generi Thu Aug 25 09:37 - 09:29 (34+23:51)
andrea   tty3                          Sat Jul  2 15:13 - 15:13  (00:00)
andrea   tty2         tty2             Sat Jul  2 13:57 - down  (53+19:40)
reboot   system boot  5.15.0-40-generi Sat Jul  2 13:56 - 09:37 (53+19:40)
andrea   tty2         tty2             Sat Jul  2 10:52 - down   (03:04)
reboot   system boot  5.15.0-40-generi Sat Jul  2 10:51 - 13:56  (03:04)
andrea   tty2         tty2             Sat Jul  2 11:17 - down   (-00:28)

I'm running the 5.15.0-50-generic kernel, and below a snapshot of the hardware. Full details available here.

H/W path         Device          Class          Description
===========================================================
                                 system         MACHD-WXX9 (C100)
/0                               bus            MACHD-WXX9-PCB
/0/0                             memory         128KiB BIOS
/0/4                             processor      11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
/0/4/6                           memory         128KiB L1 cache
/0/4/7                           memory         5MiB L2 cache
/0/4/8                           memory         8MiB L3 cache
/0/5                             memory         192KiB L1 cache
/0/d                             memory         16GiB System Memory
/0/d/0                           memory         2GiB Row of chips LPDDR4 Synchronous 4267 MHz (0.2 ns)
/0/d/1                           memory         2GiB Row of chips LPDDR4 Synchronous 4267 MHz (0.2 ns)
/0/d/2                           memory         2GiB Row of chips LPDDR4 Synchronous 4267 MHz (0.2 ns)
/0/d/3                           memory         2GiB Row of chips LPDDR4 Synchronous 4267 MHz (0.2 ns)
/0/d/4                           memory         2GiB Row of chips LPDDR4 Synchronous 4267 MHz (0.2 ns)
/0/d/5                           memory         2GiB Row of chips LPDDR4 Synchronous 4267 MHz (0.2 ns)
/0/d/6                           memory         2GiB Row of chips LPDDR4 Synchronous 4267 MHz (0.2 ns)
/0/d/7                           memory         2GiB Row of chips LPDDR4 Synchronous 4267 MHz (0.2 ns)
/0/100                           bridge         11th Gen Core Processor Host Bridge/DRAM Registers
/0/100/2         /dev/fb0        display        TigerLake-LP GT2 [Iris Xe Graphics]
/0/100/2/0       input15         input          DP-3
/0/100/4                         generic        TigerLake-LP Dynamic Tuning Processor Participant
/0/100/7                         bridge         Tiger Lake-LP Thunderbolt 4 PCI Express Root Port #0
/0/100/7.2                       bridge         Tiger Lake-LP Thunderbolt 4 PCI Express Root Port #2
/0/100/d                         bus            Tiger Lake-LP Thunderbolt 4 USB Controller
/0/100/d/0       usb1            bus            xHCI Host Controller
/0/100/d/1       usb2            bus            xHCI Host Controller
/0/100/d/1/3                     bus            USB3.0 Hub
/0/100/d.2                       bus            Tiger Lake-LP Thunderbolt 4 NHI #0
/0/100/d.3                       bus            Tiger Lake-LP Thunderbolt 4 NHI #1
/0/100/14                        bus            Tiger Lake-LP USB 3.2 Gen 2x1 xHCI Host Controller
/0/100/14/0      usb3            bus            xHCI Host Controller
/0/100/14/0/4                    bus            USB2.0 Hub
/0/100/14/0/7    input16         multimedia     HD Camera: HD Camera
/0/100/14/0/a                    communication  AX201 Bluetooth
/0/100/14/1      usb4            bus            xHCI Host Controller
/0/100/14.2                      memory         RAM memory
/0/100/14.3      wlp0s20f3       network        Wi-Fi 6 AX201
/0/100/15                        bus            Tiger Lake-LP Serial IO I2C Controller #0
/0/100/15.1                      bus            Tiger Lake-LP Serial IO I2C Controller #1
/0/100/16                        communication  Tiger Lake-LP Management Engine Interface
/0/100/1d                        bridge         Tiger Lake-LP PCI Express Root Port #9
/0/100/1d/0      /dev/nvme0      storage        SAMSUNG MZVLB512HBJQ-00000
/0/100/1d/0/0    hwmon3          disk           NVMe disk
/0/100/1d/0/2    /dev/ng0n1      disk           NVMe disk
/0/100/1d/0/1    /dev/nvme0n1    disk           512GB NVMe disk
/0/100/1d/0/1/1  /dev/nvme0n1p1  volume         199MiB Windows FAT volume
/0/100/1d/0/1/2  /dev/nvme0n1p2  volume         15MiB reserved partition
/0/100/1d/0/1/3  /dev/nvme0n1p3  volume         79GiB Windows NTFS volume
/0/100/1d/0/1/4  /dev/nvme0n1p4  volume         511MiB Windows FAT volume
/0/100/1d/0/1/5  /dev/nvme0n1p5  volume         17GiB Windows NTFS volume
/0/100/1d/0/1/6  /dev/nvme0n1p6  volume         1023MiB Windows NTFS volume
/0/100/1d/0/1/7  /dev/nvme0n1p7  volume         347GiB EXT4 volume
/0/100/1d/0/1/8  /dev/nvme0n1p8  volume         29GiB Linux swap volume
/0/100/1e                        communication  Tiger Lake-LP Serial IO UART Controller #0
/0/100/1e.3                      bus            Tiger Lake-LP Serial IO SPI Controller #1
/0/100/1f                        bridge         Tiger Lake-LP LPC Controller
/0/100/1f/0                      system         PnP device PNP0c02
/0/100/1f/1                      generic        PnP device INT3f0d
/0/100/1f/2                      input          PnP device PNP0303
/0/100/1f/3                      system         PnP device PNP0c02
/0/100/1f/4                      system         PnP device PNP0c02
/0/100/1f/5                      system         PnP device PNP0c02
/0/100/1f/6                      system         PnP device PNP0c02
/0/100/1f.3      card0           multimedia     Tiger Lake-LP Smart Sound Technology Audio Controller
/0/100/1f.4                      bus            Tiger Lake-LP SMBus Controller
/0/100/1f.5                      bus            Tiger Lake-LP SPI Controller
/1                               power          HB4593R1ECW-22T0
/2               input0          input          Lid Switch
/3               input1          input          Power Button
/4               input10         input          GXTP7863:00 27C6:01E0 Touchpad
/5               input12         input          SYNA2393:00 06CB:19AC
/6               input14         input          Video Bus
/7               input17         input          sof-hda-dsp Headphone
/8               input18         input          sof-hda-dsp HDMI/DP,pcm=3
/9               input19         input          sof-hda-dsp HDMI/DP,pcm=4
/a               input2          input          AT Translated Set 2 keyboard
/b               input20         input          sof-hda-dsp HDMI/DP,pcm=5
/c               input23         input          Paris Keyboard
/d               input26         input          Paris Mouse
/e               input8          input          Huawei WMI hotkeys
/f               input9          input          GXTP7863:00 27C6:01E0 Mouse

The full HTML report shows two sections in red: memory and serial bus controller.

I'm not sure whether the red is indicative of a problem, though I remember I once repaired a couple of computers experiencing sudden crash because of 1) faulty ram 2) CPU and ram being incompatible. Not sure if here the 1) problem is the same.

The /proc/sys/kernel/panic file has a 0 in it; I guess this mean no faulty drivers could have been the cause of the reboot?

Any help to continue the investigation and to solve the issue is appreciated.

Hannu
  • 5,374
  • 1
  • 23
  • 40

1 Answers1

1

If you see RANDOM software issues, i.e. if it appears "not possible to analyze with reading logs" -
then you might consider turning towards finding hardware issues instead.

My experiences of irrational behavior has "always" ended up be "icky" connections between hardware;

e.g. RAM modules with grimy connectors (use a pencil eraser to clean!) and
glitchy old Molex-type PSU feeds (replace!).

The smaller the computer enclosure, the more important to ensure cooling is ample.

Clean out any dust buildup periodically and
ensure that cooling air can flow freely while in use;
always avoid blockage "at all cost".

If the issues remain after a grime and dirt cleanup, you might have electronics that has permanent damage e.g. due to overheating too many times: Then there is no other remedy than; replace.
(Hard to find problem, may be inside "lost timing" between circuitry, most often not repairable by simple means).

Hannu
  • 5,374
  • 1
  • 23
  • 40
  • 1
    Every time I've ever seen an unexpected and non-obvious-cause reboot, it's been due to overheating, as you say in your last point. Usually dust buildup, but in one case the CPU heat sink let go. – MDeBusk Oct 15 '22 at 06:54
  • there are no apparent other way to remove dust to unscrew the bottom panel, and I'm not even sure the DRAM modules have been weld to the motherboard. Any way to start with something less invasive? Where can I check a log of the temperature to see if that might be the case? Could that be a faulty USB C hub I use to plug a secondaru monitor and give power to the laptop? – Andrea Moro Oct 15 '22 at 09:10