16

Ubuntu always seems to freeze in the first ~15 minutes of when it's booted on my machine. Sometimes it's in the first 5 minutes, sometimes it takes 30 minutes, occasionally it never happens...

I can't reproduce it deterministically, but it happens often enough anyway that I probably just wait for it to happen again.

How can I diagnose the freeze to figure out the cause?

Note to close-voters:
No, this is not a duplicate of this question. This question is about diagnosis, not a temporary recovery. The answers on that question only tell me how to kill the X Server, use the Magic Combo to reset the kernel, etc.... which doesn't help me figure out the cause.

Some information:

  1. Ubuntu 11.04: 2.6.38-15-generic #66-Ubuntu SMP x86_64 GNU/Linux

  2. The mouse sometimes moves around, but the UI never responds.

  3. Pressing Ctrl+Alt+F1 to get into a terminal doesn't work.

  4. The Alt+SysRq combos do work... and seem to be the only things that work, aside from the mouse (which sometimes also can move around).

  5. I'm not running out of any resources (many gigabytes of RAM and file system space are free)

  6. Possibly relevant hardware (from the Hardware Lister application):

    • AR9285 Wireless Network Adapter (PCI-Express)

    • GT216 [GeForce GT 330M] (I'm using the Nouveau driver, which seems to work well)

user541686
  • 4,167
  • saw a comment elsewhere about trying to ssh into your machine from another machine - success/failure at least indicate the severity of the freeze - great question! – lofidevops Apr 25 '13 at 09:42
  • see also http://askubuntu.com/questions/75325/tools-to-diagnose-ubuntu-problems – lofidevops Apr 25 '13 at 09:46

1 Answers1

10

The logs should always be your first port of call. Check syslog for anything untoward:

less /var/log/syslog

Also check the Xserver logs in case there's any indication of a graphics driver problem (although that sounds less likely given your description):

less /var/log/Xorg.0.log

In your particular case, these steps might not throw up anything interesting. In which case, I'd be interested to see what's going on on your system at the time of the problem developing. To that end, personally, I'd set up a temporary log of top output at short intervals - say every 5 or 10 seconds. This should hopefully reveal if a process is running wild with resources at the time of the issue.

Note that alternatives exist, such as switching to another tty with Ctrl+Alt+F1..F6 (to get back to the GUI, it's Ctrl+Alt+F7) and running commands interactively, or configuring a SSH server and logging in remotely. Both of these might be awkward if your machine is moreorless nonresponsive, hence my more awkward suggestion to write a logfile (which could also encounter the same problem, but is more likely to succeed).

It would involve something like this:

while [ 1 -eq 1 ] ; do top -b >> ~/top.log; sleep 10; done

This would write top output to a logfile at ~/top.log every 10 seconds or so. Note that this log would grow quite large if this command is left running for a prolonged period, so keep an eye on it if your machine suddenly starts behaving itself! And remove the log with rm ~/top.log when you're done with it. Note also that executing the above command is a one-time thing; it won't restart itself after a reboot.

To read the logs generated after a crash, you'd use

less ~/top.log

and hit the End key to get to the bottom. You'd be looking for processes with an unusually high %CPU value, or an unusually high RES value.

It may or may not help, but it's handy information to have.

  • Ah thanks. I just looked for /var/log/messages but it doesn't exist... is that an actual file or did you intend I should look at logs inside that directory? (If so, which logs?) – user541686 Jan 17 '13 at 20:57
  • My apologies, I'm thinking of other distributions. They're all subtly different! On Ubuntu the equivalent is /var/log/syslog. I'll update the answer. – IlluminAce Jan 17 '13 at 20:58
  • Ah no worries, thanks. :) I just looked at syslog and kern.log, and in both of those, I looked for SysRq (since the first thing I do is flush the file system)... but nothing relevant seems to have happened before the flush, according to the logs. Let me check the xorg log... – user541686 Jan 17 '13 at 21:01
  • Hmm, nothing seems to stand out in the Xorg logs either. Also it's definitely not a CPU issue (that's why I mentioned it's not a lack of resources), my CPU is barely being used at all during the freeze. The freeze is completely random... sometimes I'm dragging a window and it freezes, sometimes I've just left the computer there for five minutes and when I come back it's frozen. But it's completely unrelated to the lack of resources. Still, good info, thanks. – user541686 Jan 17 '13 at 21:03
  • I should have mentioned that there's a slight caveat to the Xorg log, in that you may find the data from the previous session has been overwritten by the new session you're in now. If that's the case, you can see the original data by booting straight to a tty (after a crash...) and checking the log there. You can boot directly to terminal by adding "text" to the end of the kernel boot line - as described here: http://askubuntu.com/questions/158382/how-to-enter-the-terminal-from-the-boot-manager When you're done, either reboot or enter X with sudo service lightdm start, or startx directly – IlluminAce Jan 17 '13 at 21:09
  • Ahhh that's great info, thanks! I'll try it when it happens again. :) – user541686 Jan 17 '13 at 21:13