2

I have some swap configured and I believe activated...

Filename                Type        Size    Used    Priority
/dev/sda5                               partition   7811068 1124912 200
/mnt/data02/swapfile                    file        134217724   37032   100
/home/swapfile                          file        134217724   36600   -1

But, when the system monitor shows memory reaching 100% the system tends to respond by closing/crashing programs. Xorg has crashed in this way as has the wireless driver. While this happens the system monitor shows little (under 5 GiB) use of the Swap. I have confirmed that swappiness is not set to an extreme value (and changes in this parameter have appeared to have no effect on the issue).

~$  cat /proc/sys/vm/swappiness
70

The system has a huge amount of RAM...

~$ free -h
             total       used       free     shared    buffers     cached
Mem:          125G        20G       105G       161M        54M       1.1G
-/+ buffers/cache:        19G       106G
Swap:         263G       1.1G       262G

... but sometimes I run overbudget on Memory resources and it would be nice if it failed more gracefully than things crashing.

What can I do to resolve this issue?

EDIT

~$ cat /etc/fstab
    # <file system> <mount point>   <type>  <options>       <dump>  <pass>
    # / was on /dev/sda3 during installation
    UUID=8dfbed62-9957-4f06-b4e1-a42020adec91 /               ext4    errors=remount-ro 0       1
    # /home was on /dev/sda6 during installation
    UUID=b6f33408-1d8b-4302-9983-5c778ef64f47 /home           ext4    defaults        0       2
    # swap was on /dev/sda5 during installation
    # ae0304dd-e63e-4d3a-99da-9c9d7a034c6e is the swap file
    UUID=fd4c00c9-49bf-4562-adea-1c817fc57ce9 none            swap    sw,pri=200              0       0
    UUID=3A323DCA323D8BBF /mnt/data01 ntfs-3g defaults,windows_names,locale=en_US.utf8  0 0
    UUID=4cc8a19d-5991-4186-8f65-7062805b66a6 /mnt/data02 ext4 defaults 0 0
    /mnt/data02/swapfile   none    swap    sw,pri=100    0   0
    /home/swapfile  none  swap  sw  0,pri=150 0

EDIT 2 In response to a comment below I watched my system perform an operation I knew would use all available RAM but not all available swap and then checked dmesg during and after the failure. The system swapped and became intermittently non-responsive (okay behavior). Then while swap was at less than 10% of capacity, Chrome crashed (Sorry, the program "chrome" closed unexpectedly. Your computer does not have enough free memory to automatically analyze the problem and send a report to the developers). Trying to get back to the dmesg output I was watching I got an error message that stated This window is not responding. Do you want to force the application to exit, or wait for it to respond. I selected 'wait'. The desktop reappeared and the gnome system monitor went in and out of being greyed out several times while the system swapped. When I checked back in, I was at the Ubuntu login screen. I logged in as normal... all of my earlier running processes were gone and I was greeted with an error message identical to the one about Chrome in reference to Xorg. Checking dmesg shows only the following two messages:

[131267.206774] Watchdog[3433]: segfault at 0 ip 00007fe38faf9756 sp 00007fe37f393770 error 6 in chrome[7fe38be0a000+510c000]
[133329.875212] nvidia 0000:03:00.0: irq 106 for MSI/MSI-X

EDIT 3 Other possible relevant topics:

Edit 4 Additional error messages:

[92315.165728] Watchdog[1319]: segfault at 0 ip 00007f7d0a417756 sp 00007f7cf9cb1770 error 6 in chrome[7f7d06728000+510c000]
[92656.478271] INFO: task Chrome_IOThread:1292 blocked for more than 120 seconds.
[92656.478275]       Tainted: P           OX 3.13.0-45-generic #74-Ubuntu
[92656.478276] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[92656.478278] Chrome_IOThread D ffff88207fc534c0     0  1292  32756 0x00000000
[92656.478282]  ffff881fa9d15dd8 0000000000000086 ffff881fad2ab000 ffff881fa9d15fd8
[92656.478285]  00000000000134c0 00000000000134c0 ffff881fad2ab000 ffff881fad2ab000
[92656.478288]  ffff881f5fce6260 ffff881f5fce6268 ffffffff00000000 ffff881f5fce6270
[92656.478290] Call Trace:
[92656.478299]  [<ffffffff817252d9>] schedule+0x29/0x70
[92656.478302]  [<ffffffff81727f55>] rwsem_down_write_failed+0x115/0x230
[92656.478307]  [<ffffffff81371d63>] call_rwsem_down_write_failed+0x13/0x20
[92656.478311]  [<ffffffff81314c90>] ? apparmor_file_mprotect+0x30/0x30
[92656.478313]  [<ffffffff8172796d>] ? down_write+0x2d/0x30
[92656.478318]  [<ffffffff8116ba7c>] vm_mmap_pgoff+0x6c/0xc0
[92656.478322]  [<ffffffff8117f916>] SyS_mmap_pgoff+0x116/0x270
[92656.478325]  [<ffffffff81018802>] SyS_mmap+0x22/0x30
[92656.478328]  [<ffffffff8173196d>] system_call_fastpath+0x1a/0x1f
  • Er... s/complaint/compliant. – russellpierce Feb 12 '15 at 18:53
  • Try increasing swappiness. – XperianX Feb 12 '15 at 20:22
  • Already tried swappiness @ 90 and there was no improvement. – russellpierce Feb 12 '15 at 20:27
  • Do the programs really crash due to RAM? What does the OOM killer say about it in dmesg? – StenSoft Feb 14 '15 at 00:46
  • @StenSoft: I see no out of memory warnings in dmesg. I've added details in the original question. – russellpierce Feb 14 '15 at 04:09
  • Maybe make swappiness 100? –  Feb 14 '15 at 05:33
  • @ethanbmnz: Now configured at 100. Same issue. My expected behavior when running out of RAM is that swap should be almost entirely used before the system starts killing processes. Is that an incorrect expectation? – russellpierce Feb 14 '15 at 18:31
  • 2
    Mentioned here: https://productforums.google.com/forum/#!topic/chrome/zjjitIYfuxw is checking if it is the graph.card with "/usr/bin/google-chrome-stable --disable-gpu %U" – Rinzwind Feb 15 '15 at 16:00
  • That you are running a Tainted kernel is a concern. What I/O scheduler are you using? "cat /sys/block/sda/queue/scheduler" (or sd whatever) – Doug Smythies Feb 15 '15 at 16:25
  • @doug both sda and sdb respond noop [deadline] cfq at the moment. Is that informative? – russellpierce Feb 15 '15 at 16:39
  • I just read a little about tainted kernels. I know for sure the Nvidia driver is proprietary. I can try to go back to the community driver if it is thought to make a difference. – russellpierce Feb 15 '15 at 16:44
  • scheduler: Just wanted to check. That is the correct default. Tainted: Untainted would be worth a test, just my opinion. Question: Did the edit 4 error occur with increased min_free? – Doug Smythies Feb 15 '15 at 16:52
  • Yes Doug. Sorry I didn't specify that. Your min_free fix left the system alive long enough to see that. But Xorg still eventually crashed. I'm running it now without Chrome to see what happens. – russellpierce Feb 15 '15 at 17:11
  • Let's admit that the root problem is not directly linked to swap/ram, but might be due to any resource leak (in analysis tool). The message "Your computer does not have enough free memory to automatically analyze the problem and send a report to the developers" could be easily explained with the comment #4 of this old bug report https://bugs.launchpad.net/ubuntu/+source/apport/+bug/718635 and would be an indirect consequence of the root problem which caused crashes of Xorg, Chrome, etc. Identifying the exact resource which is leaked would explain why programs close even if swap is near empty. – Golboth Feb 18 '15 at 22:43
  • @Golboth: There is no resource leak in the analysis tool. Something /does/ go wrong under a low RAM condition, and the culprit is (in this case) Chrome and/or the Nvidia driver. As a general lesson learned, people can take away that this apparent behavior can be caused by something other than system configuration/straight up OOM. But, who knows whether anyone will experience anything quite like this in a different context. – russellpierce Feb 19 '15 at 04:25
  • 1
    Not sure about this, but privileged processes can ask allocating memory (with mmap(...,MAP_LOCKED)) just in RAM, non-swappable, etc. Both chrome and the graphic drivers do have suid parts (look at chrome sandbox). So you can have an OOM condition even with swap available --- just for this kind of memory. I do not know if this can trigger the OOM killer, or just the allocation fails and the process kill itself -- this is why this is just a comment and not an answer. – Rmano Feb 19 '15 at 08:29

2 Answers2

3

Depending on what they are doing, Computers with large amounts of memory, such as yours, can get into difficulties when the amount of free memory (resident as opposed to swap) becomes very low. Sometimes (not sure for your situation) things can be improved by increasing the minimum amount of memory kept free, or /proc/sys/vm/min_free_kbytes. Think of it as keeping more room free such that things are easier to move around and re-group and de-fragment and such. Try a very large number first, say 20G, and if that helps try to reduce it. You might also help yourself by watching "free" carefully in an attempt to correlate the issues with some minimum available memory.

Method 1 (script run as sudo):

#! /bin/bash
cat /proc/sys/vm/min_free_kbytes

echo "20000000" > /proc/sys/vm/min_free_kbytes

cat /proc/sys/vm/min_free_kbytes

Method 2 (direct command):

echo "20000000" | sudo tee /proc/sys/vm/min_free_kbytes
Doug Smythies
  • 15,448
  • 5
  • 44
  • 61
  • Do you have a specific thought/reasoning behind the 20 GB number? I'll give it a shot and respond back (it was set at 67584 kb). – russellpierce Feb 15 '15 at 05:03
  • I just wanted to suggest a very large number, something that should be overkill, as a starting test. No use wasting a lot of time, if this isn't the solution. A few months ago I helped on an Ubuntu forums thread, where this solution did fix the issue. I think that one ended up at 10G min_free out of about 380G. – Doug Smythies Feb 15 '15 at 06:57
  • Doug, your solution helped me get a more informative error message and left the system more responsive during early parts of an out of RAM (but swap left over) scenario. However, after trying various combinations the note @Rinzwind left about a Chrome bug when the GPU is enabled appears to be the prime culprit. Would you care to incorporate it into your answer? Alternatively Rinzwind, would you care to post your comment as an answer? – russellpierce Feb 16 '15 at 14:25
  • I'll get back to you on this, but am out of town for a day or two. – Doug Smythies Feb 16 '15 at 20:57
  • 1
    I agree that what I suggested helped by giving a bit more time which gave better errors messages leading to the more helpful suggestion. However, the amount of memory you have is much more than most typical users, and thus the extra time simply might not be made available for most systems. I am reluctant to edit my answer to include the @Rinzwind note for two reasons: First, it was not my insight; Second, and as explained above, I don't know that extra insight for this particular type of issue would be a gained for the general user. – Doug Smythies Feb 18 '15 at 21:53
1

To find out if your processes have been killed by the OOM killer you can inspect the result of this command:

sudo egrep -ri 'killed process' /var/log/ | grep -v auth.log

If this is the case you might want to look at the article about taiming the OOM Killer. http://lwn.net/Articles/317814/

stalet
  • 589
  • 4
  • 13
  • Thanks, it doesn't seem like it was the OOM killer in my case because the command you mention above doesn't yield any resulting lines. However, it does seem like a very useful piece of knowledge to have in response to this question. – russellpierce Feb 20 '15 at 11:47