3

I have a strange problem. On a freshly installed Ubuntu 18.04, the systems seem to work fine. Suddenly, apparently for no reasons, the system hangs up, for 10 seconds or a couple of minutes, and I am unable to perform anything.

I tried to leave a top instance open and the RAM/CPU usage seems to be fine. I am on a i5 machine with 6GB of RAM, and 12GB of swap. I just tested ram and disk and they are error-free.

EDIT Some additional information. I set the CPU frequency governor to performance, so it always work at maximum.

The problem appears more often when performing a CPU intensive operation, such as data analysis. After it finishes, the GUI becomes totally unresponsive, and it's hard or impossible to get it back to work.

EDIT Output of grep . -r /sys/firmware/acpi/interrupts

/sys/firmware/acpi/interrupts/gpe2F:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe23:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe1F:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe13:       0  EN     enabled      unmasked
/sys/firmware/acpi/interrupts/gpe0F:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe03:       0         disabled     unmasked
/sys/firmware/acpi/interrupts/gpe3D:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe31:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe2D:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe21:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe1D:       0  EN     enabled      unmasked
/sys/firmware/acpi/interrupts/ff_pwr_btn:       0  EN     enabled      unmasked
/sys/firmware/acpi/interrupts/gpe11:       0     STS invalid      unmasked
/sys/firmware/acpi/interrupts/gpe0D:       0         disabled     unmasked
/sys/firmware/acpi/interrupts/gpe01:       0  EN     enabled      unmasked
/sys/firmware/acpi/interrupts/gpe3B:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe2B:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/ff_rt_clk:       0         disabled     unmasked
/sys/firmware/acpi/interrupts/ff_pmtimer:       0     STS invalid      unmasked
/sys/firmware/acpi/interrupts/gpe1B:       0     STS invalid      unmasked
/sys/firmware/acpi/interrupts/gpe38:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe0B:       0         disabled     unmasked
/sys/firmware/acpi/interrupts/gpe28:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe18:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe08:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe36:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe26:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/error:       0
/sys/firmware/acpi/interrupts/gpe16:       0  EN     enabled      unmasked
/sys/firmware/acpi/interrupts/sci:       4
/sys/firmware/acpi/interrupts/gpe06:       4  EN     enabled      unmasked
/sys/firmware/acpi/interrupts/gpe34:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe24:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe14:       0  EN     enabled      unmasked
/sys/firmware/acpi/interrupts/gpe04:       0         disabled     unmasked
/sys/firmware/acpi/interrupts/gpe3E:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe32:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe2E:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe22:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe1E:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe12:       0  EN     enabled      unmasked
/sys/firmware/acpi/interrupts/gpe0E:       0         disabled     unmasked
/sys/firmware/acpi/interrupts/gpe02:       0  EN     enabled      unmasked
/sys/firmware/acpi/interrupts/gpe3C:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe30:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe2C:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe20:       0         disabled     unmasked
/sys/firmware/acpi/interrupts/gpe1C:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe10:       0     STS invalid      unmasked
/sys/firmware/acpi/interrupts/gpe39:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe0C:       0         disabled     unmasked
/sys/firmware/acpi/interrupts/gpe00:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe3A:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe_all:       4
/sys/firmware/acpi/interrupts/gpe29:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe2A:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe19:       0     STS invalid      unmasked
/sys/firmware/acpi/interrupts/gpe1A:       0     STS invalid      unmasked
/sys/firmware/acpi/interrupts/gpe09:       0         disabled     unmasked
/sys/firmware/acpi/interrupts/gpe37:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe0A:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe27:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe17:       0     STS invalid      unmasked
/sys/firmware/acpi/interrupts/ff_gbl_lock:       0  EN     enabled      unmasked
/sys/firmware/acpi/interrupts/gpe07:       0         enabled      unmasked
/sys/firmware/acpi/interrupts/sci_not:       0
/sys/firmware/acpi/interrupts/gpe35:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe25:       0         disabled     unmasked
/sys/firmware/acpi/interrupts/gpe15:       0  EN     enabled      unmasked
/sys/firmware/acpi/interrupts/gpe05:       0         disabled     unmasked
/sys/firmware/acpi/interrupts/gpe3F:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe33:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/ff_slp_btn:       0         invalid      unmasked

EDIT 04/03/2019 I run a complete SMART test, which now does not look so good, at least in my opinion.

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
3 Spin_Up_Time            0x0027   179   176   021    Pre-fail  Always       -       4025
4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       218
5 Reallocated_Sector_Ct   0x0033   154   154   140    Pre-fail  Always       -       364
7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
9 Power_On_Hours          0x0032   034   034   000    Old_age   Always       -       48741
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       217
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       100
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       117
194 Temperature_Celsius     0x0022   089   080   000    Old_age   Always       -       58
196 Reallocated_Event_Count 0x0032   022   022   000    Old_age   Always       -       178
197 Current_Pending_Sector  0x0032   199   199   000    Old_age   Always       -       234
198 Offline_Uncorrectable   0x0030   199   199   000    Old_age   Offline      -       245
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   188   188   000    Old_age   Offline      -       2436
240 Head_Flying_Hours       0x0032   038   038   000    Old_age   Always       -       45709
241 Total_LBAs_Written      0x0032   200   200   000    Old_age   Always       -       81196791754
242 Total_LBAs_Read         0x0032   200   200   000    Old_age   Always       -       75991010629
LucaB
  • 133
  • I am already doing it, but nothing interesting appears. For instance, right now clicking on an email on thunderbird is totally not respondive, and /var/log/syslog and dmesg do not say anything. – LucaB Feb 18 '19 at 15:15
  • Please add additional information to your Question instead of burying it in (often unread) comments. – user535733 Feb 18 '19 at 15:37
  • classical overheating, I'd say – s1mmel Feb 25 '19 at 14:06
  • Have you checked swap usage? Instead of plain top, I’d recommend htop (and sudo iotop for hard disk activity). Also what’s your hard disk type (mechanical, SSD)? – Melebius Feb 25 '19 at 14:58
  • @s1mmel Ovearheating seems to be fine, as I checked it. Also, it doesn't resume when CPU-intensive task ends. – LucaB Feb 25 '19 at 15:02
  • @Melebius yes, SWAP seems to be fine. I am also using htop, which basically shows the 4 cores with no usage, and RAM at 40%. The disk is mechanical. – LucaB Feb 25 '19 at 15:03
  • reduce swap to 3gb.... normally you would use swap the same size mem is, nowadays it is not necessary anymore. this is the first thing I'd change. did you check /var/log/messages and/or syslog for any entries? – s1mmel Feb 25 '19 at 15:19
  • u might consider turning on core dumps...mybe they will give you an indicator what caused this.

    see https://stackoverflow.com/questions/6152232/how-to-generate-core-dump-file-in-ubuntu

    – s1mmel Feb 25 '19 at 15:22
  • Output of grep . -r /sys/firmware/acpi/interrupts and dmesg when it happens – j-money Feb 27 '19 at 09:02
  • I added the details in the question – LucaB Mar 01 '19 at 19:05
  • I added the smart details in the question, which does not look so good, at least in my opinion. – LucaB Mar 04 '19 at 10:02

4 Answers4

2

I would also check CPU temps and make sure your cooling fan is ok. If the cooling fan is ok, you might want to check for malware / viruses.

Also, sometimes you may need to update your BIOS to fully accommodate new features in newer operating systems (depending on the system)

Another thing I have found can cause system freezes is if your Internet connection is dropped especially during updates etc so also check your internet connection and make sure it isn't dropping.

Somewhat "shots in the dark", but maybe one suggestion will help. More information on your system such as the MainBoard brand, model and version could be helpful.

User6655
  • 106
  • It is a dell inspiron all in one system back from 2012. Intel i5, Radeon graphic card, 6GB of RAM, 1 TB of hard disk.

    Everything you mentioned seems to be fine. I also tested HDD with SMART and got no errors.

    – LucaB Mar 01 '19 at 18:56
  • Actually, I don't know what went wrong with SMART, which was showing a fine HDD. I re-run the test again, and I got (I believe) very bad results, which I added in the question above. – LucaB Mar 04 '19 at 10:03
1

This is just from personal experience, but if the other suggestions aren't helping because your CPU is at a good temperature. You may want to consider finding another similar CPU that's compatible with your motherboard and seeing if putting that in helps fix the issue. I had a CPU die recently and it was doing nearly identical stuff to what you're describing before it completely died. Could also be a motherboard issue of some kind, but I'd check the CPU first. I understand getting and testing other parts may not entirely be practical too, but this kind of issue in my experience tends to be a hardware issue of some kind.

If both of those aren't an issue, I would run a SMART test on the hard drive with Disks Utility, details here on that: How can I check the SMART status of a SSD or HDD on current versions of Ubuntu 14.04 through 18.10?

  • "Trying" a new CPU isn't that easy, and SMART looks fine. I would try other possibilities at first :) – LucaB Mar 04 '19 at 09:26
  • Actually, I don't know what went wrong with SMART, which was showing a fine HDD. I re-run the test again, and I got (I believe) very bad results, which I added in the question above. – LucaB Mar 04 '19 at 10:03
  • oof, those aren't the worst, but not good, definitely back up anything important on the machine right now – tommy61157 Mar 04 '19 at 12:11
0

Try to tweak your settings around swapping. For example by running sudo sysctl vm.swappiness=20, after a reboot this will be reverted again. Even if your memory is not yet used completely the kernel starts to swap parts to disk to keep some headroom. Choosing a rather low value will lead to less free headroom, but also less swapping. The optimal value depends on your memory size and also the workload that you are running.

When you have found a value you are fine with you can set it permanently by adding a line like this to /etc/sysctl.conf:

vm.swappiness=20

For more background information see: What is swappiness and how do I change it?

webwurst
  • 2,385
  • Actually now the system is very slow with swap at 0%, so I think this may not be an issue (maybe not the principal one) – LucaB Mar 04 '19 at 09:27
0

Get info from system monitors (e.g., sensors; the GUI, which is likely not useful for you, is psensor) and dump it so you can do post mortem analysis. RRDTool may come in handy.

You can output info with time and date, select the interval for dumping data, get hard disk temperature, etc.

See

How to monitor & log server hardware temperatures & load

Temperature monitoring help

https://ubuntuforums.org/showthread.php?t=1998005

https://ubuntuforums.org/showthread.php?t=2364408

http://manpages.ubuntu.com/manpages/bionic/man8/turbostat.8.html

http://manpages.ubuntu.com/manpages/trusty/man8/hddtemp.8.html