2

I am a somewhat nontechnical user of a desktop running 14.04 LTS. I've been working in Ubuntu for several years. The hardware is somewhat old.

Occasionally I experience technical issues with Ubuntu -- usually it's a slowdown or freeze, but recently (i.e.,in the last week) I've been having frequent restart of Unity (which wipes out all running apps and requires a login).

It occurred to me that I have no idea how to diagnose an issue like this -- or any system crashes issues. I don't know the tools or even the method for diagnosing things.

The only thing I have been using to monitor system issues is htop. From that I see periodic spikes in CPU and Memory -- usually for Firefox and Amarok and chromium, but sometimes with compiz or some cryptic system command (like "X core :0 -seat ...." sorry I don't know how to copy output from htop).

The problem tends to happen when I'm downloading things from browsers, although I don't want to say it's the only time....

I have opened up dmesg and var/log/syslog, but confess I don't know how to interpret the data.

dmesg might have interesting data, but I don't know how to figure out the timestamp. I sort of understand syslog, but I don't have enough experience to know what sort of error reporting is significant and what to do about it:

Here for example is the syslog for the last crash of the window manager:

  $(/usr/lib/php5/maxlifetime))
Jan 30 19:17:01 robert-KJ379AA-ABA-a6400f CRON[4048]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Jan 30 19:33:29 robert-KJ379AA-ABA-a6400f wpa_supplicant[992]: message repeated 29 times: [ wlan1: CTRL-EVENT-SCAN-STARTED ]
Jan 30 19:34:21 robert-KJ379AA-ABA-a6400f wpa_supplicant[992]: wlan1: WPA: Group rekeying completed with 74:9d:dc:5f:32:b1 [GTK=TKIP]
Jan 30 19:35:29 robert-KJ379AA-ABA-a6400f wpa_supplicant[992]: wlan1: CTRL-EVENT-SCAN-STARTED 
Jan 30 19:39:01 robert-KJ379AA-ABA-a6400f CRON[4123]: (root) CMD (  [ -x /usr/lib/php5/maxlifetime ] && [ -x /usr/lib/php5/sessionclean ] && [ -d /var/lib/php5 ] && /usr/lib/php5/sessionclean /var/lib/php5 $(/usr/lib/php5/maxlifetime))
Jan 30 19:59:22 robert-KJ379AA-ABA-a6400f kernel: [ 7911.658443] [drm:radeon_gem_object_create] *ERROR* Failed to allocate GEM object (4096, 2, 4096, -12)
Jan 30 19:59:29 robert-KJ379AA-ABA-a6400f kernel: [ 7918.797835] chrome invoked oom-killer: gfp_mask=0x0, order=0, oom_score_adj=200
Jan 30 19:59:29 robert-KJ379AA-ABA-a6400f kernel: [ 7918.797842] chrome cpuset=/ mems_allowed=0
Jan 30 19:59:29 robert-KJ379AA-ABA-a6400f kernel: [ 7918.797846] CPU: 1 PID: 2837 Comm: chrome Not tainted 3.13.0-76-generic #120-Ubuntu`

I suspect the crash occurred at Jan 30 19:39:01 because that's the biggest time gap. The first message after the crash is a radeon (videocard) message, and that seems the likely culprit, but on the other hand, I'm guessing that the memory/cpu use also plays a factor. Also, would you expect crash data to show up AFTER the crash?

Are these the only tools for figuring out the problem? Are there any methods for narrowing the problem down to the hardware/app/system space?

UPDATE: Another crash with more error messages pointing to compiz/window manager failure. (I Have no idea how to solve that). Here's some stuff from syslog:

Jan 31 11:39:28 robert-KJ379AA-ABA-a6400f kernel: [64317.672548] [drm:radeon_gem_object_create] *ERROR* Failed to allocate GEM object (1048576, 2, 4096, -23 
Jan 31 11:39:28 robert-KJ379AA-ABA-a6400f kernel: [64317.672591] compiz[15437]: segfault at 0 ip 00007f5e027bd7b6 sp 00007ffe329bf9c0 error 6 in r600_dri.so[7f5e0254d000+399000]
Jan 31 11:39:39 robert-KJ379AA-ABA-a6400f gnome-session[15215]: WARNING: Child process 15437 was already dead.
Jan 31 11:39:39 robert-KJ379AA-ABA-a6400f gnome-session[15215]: WARNING: Application 'compiz.desktop' killed by signal 11
    Jan 31 11:39:39 robert-KJ379AA-ABA-a6400f gnome-session[15215]: WARNING: App 'compiz.desktop' respawning too quickly
Jan 31 11:39:40 robert-KJ379AA-ABA-a6400f gnome-session[15215]: CRITICAL: We failed, but the fail whale is dead. Sorry....``

UPDATE 2: I see that the same error messages happen every time. Something seems to be killing compiz.desktop/gnome-session. I just don't know what to do about it.

UPDATE 3 Apparently the problem has become more serious. Unity doesn't load, and all I get is a blank desktop. I'm trying the troubleshooting steps on this thread without success so far. I'm reaching the conclusion that the problem is primarily on the software/OS side rather than the hardware side, though I really don't know for sure! Unity doesn't load, no Launcher, no Dash appears

1 Answers1

0

You can make use of the log analysis and the visualization softwares. One of them is Splunk Enterprise which is available free also for personal use(around 500 MB data analysis). In my case I have used to look into /var/log directories in a recursive manner. Try it Splunk

If you are looking for Opensource alternative, you can refer to this end to end tutorial at here

It's easy to work with Splunk OOTB though it's fun to setup ELK. Hope this helps.

Ashu
  • 3,966