For a couple months now, I have had an issue with my Ubuntu server where every few days, the machine locks up and is completely unresponsive. The only thing I see in the tty is the following message over and over, usually between 2 of few processes (PLEX media server, SSHd, rtorrent, tmux, etc...)
Mar 31 22:11:43 yggdrasil kernel: watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [Plex DLNA Serve:23621]
I see some other information sometimes, but I can never find it in any log, journalctl only ever has one instance of the log line, but there are dozens in the tty when I restart.
I have replaced the motherboard, the GPU, and the power supply and the problem persists.
Specs are as follows:
- CPU: AMD Ryzen 5 1600X
- Mobo: ASUS ROG STRIX X370-F Gaming
- GPU: nVidia GT 210
Is there any other steps I can get to the bottom of this? Should I try to panic and get a kernel memory dump when this occurs? How would I do that?
Update, caught the crash earlier and saw a calltrace and some more info: https://i.stack.imgur.com/pxTwc.jpg
For now I have switched to an AMD dGPU and will wait for the issue to come up again before blaming the CPU.
– Evan C Apr 09 '19 at 14:03