I run a bath (snakemake -j 1
) of memory-heavy operations in Python: subtracting two arrays up to 15 GB each, then calculating norms of the difference. Surprisingly my system started to misbehave:
- Thunderbird crashes,
- my graphic environment (XFCE with lightdm) crashes (effectively killing my screen sessions with the bath running),
- after graphic environment respawned it swapped my monitors (pun intended) and did not allow me to re-swap them with Display settings -
service lightdm restart
was necessary, - my snakemake pipeline (bash + Python + numpy + pandas) tends to fail with segmentation faults when processing the biggest arrays,
- yesterday I discovered I lost audio from Firefox,
- recently after pipeline and graphic session crash one of
bash
processes went wild (100% CPU usage).
I have plenty (932 GB) of swap available, so it is not that my system suddenly ran out of memory. RAM chips also seems to work (17 passes of Memtest86+ revealed no error).
I ask about the reason behind crashes/misbehaviour of other programs (Thunderbird, screen session, graphic environment). Even if my programs were poorly written, I would expect their impact to be limited extensive swapping. A total XFCE session restart is something that definitively should not happen. And by restart I mean restart, not freezing or slowdown due to swapping.
badblock
found 207 badblocks at 0.1% of scan, then I replaced the old swap HDD. Out of curiosity I have just reran batch of the same jobs, in 1-2 hours I should know if my system is unstable. – abukaj Sep 29 '23 at 19:14