1

I want to start a big simulation on an ubuntu desktop computer. I have physical (not remote) access to this PC. This simulation may take some weeks. The command to start the process is:

mpirun -np 100 icoFoam -parallel | tee log

Where icoFoam is the executable and -parallel is needed as its option.
This command prints data in terminal. Some times the terminal is closed or the OS is logged-out randomly during long simulations and due to this, the process terminates. I tried to figure it out by a couple of alternative commands:

nohup mpirun -np 100 icoFoam -parallel > log  & 
nohup mpirun -np 100 icoFoam -parallel > log  & disown & 
nohup mpirun -np 100 icoFoam -parallel | tee log & disown & 
nohup mpirun -np 100 icoFoam -parallel | tee log & disown & > /dev/null 2>& 1 & nohup mpirun -np 100 icoFoam -parallel > /dev/null 2>& 1  &
systemd-run --scope --user mpirun -np 100 icoFoam -parallel | tee log &
systemd-run --scope --user mpirun -np 100 icoFoam -parallel | tee log & disown &
systemd-run --scope --user nohup mpirun -np 100 icoFoam -parallel | tee log & disown &
tmux

Results
Except form tmux, using any one of these commands, the process is terminated when I close the terminal.
tmux is also terminated when I log out from my user account.

My Findings

  1. As the simplest workaround, I mixed nohup and disown ( from here ).
  2. I guessed that commands including tee, are terminated because of SIGPIPE caused by closing the terminal (from here). Therefore I used redirection to a log file or /dev/null (from here), both of which were also terminated by closing the terminal.
  3. I also examined systemd-run. But it also is terminated by closing the terminal.
  4. To see whether the program has installed its own handler, I executed this:

    nohup mpirun -np 100 icoFoam -parallel > log  &
    grep Sig /proc/$!/status
    

    Which returns

    SigIgn: 0000000000000000
    

    Therefore, I guess this is the case, i.e. mpirun has installed its own handler overriding the protection of nohup (from here).

  5. I don't know if it is possible to send a custom handler to mpirun in order not to override nohup.

My Question
I want to execute the following command such that it prints output in the terminal as long as the terminal is not closed, and also the process is not terminated by closing the terminal or logging out from the user account.

mpirun -np 100 icoFoam -parallel

OS: Ubuntu 18.04
Executable: OpenFOAM
mpirun (Open MPI): 2.1.1

Update
By log out, I mean pressing log out button (image), not lock-screen (super+L) enter image description here

Thank you in advance.

muru
  • 197,895
  • 55
  • 485
  • 740
Naghi
  • 157
  • "The OS is logged out randomly" - it's not clear if you are referring to a normal lock-screen or if you are referring to a desktop crash. A crash will terminate all child processes and lose all your work...which might be what you are describing. 'nohup' won't help you survive a crash - crashes must be diagnosed and fixed. – user535733 Jan 25 '20 at 11:50
  • @user535733 OK. How about "logging out deliberately"? I don't mean lock-screen, I mean pressing log out button as included in the update in the question. Does a deliberately logging out terminates all child processes? And the main question still is, why nohup terminates when I close the terminal? Thank you for your informative comment. – Naghi Jan 25 '20 at 13:00
  • Dear @user535733 , When I leave my pc, how can I somehow lock tty in order to keep it safe from the others to cause interruption, as well as keep the processes to continue, without being killed by this? – Naghi Feb 01 '20 at 10:03
  • Use a terminal multiplexer like screen or tmux. Those run in the background even when you're not logged in to any tty or desktop, and they provide a stable parent process for your ongoing jobs to be children of. Both applications are easy to use (try a few tutorials first), and both are in the Ubuntu repositories. – user535733 Feb 01 '20 at 14:45

3 Answers3

2

The problem is that you are starting the job from within a desktop environment, so the jobs are children of that desktop. When the desktop ends, for whatever reason, all children automatically end, too. 'nohup' won't save them - logout removes the display that output should print to, which should also cause a fatal error.

Consider running tmux in a tty instead of a terminal window. Then the process can run forever regardless of whatever the desktop is doing.

user535733
  • 62,253
  • 1
    That's it. The problem was exactly "starting the job from within a desktop environment". – Naghi Jan 25 '20 at 15:28
  • 1
    @Alish, now you can vote up, have nice time. This is an interesting answer. I never thought about it this way, input/output stream connections. Thank you. – user.dz Apr 21 '20 at 12:52
1

I have the same configuration (Ubuntu 18.04, OpenFoam v7, Open MPI 2.1.1) here and I am facing the same issues. The only solution that helped were the steps described in this post:

  1. Start the window manager from your console using screen and press Enter.
  2. In the screen console, you can then input your commands according to your needs, p. e.

    nohup mpirun -np 100 icoFoam -parallel > log  &
    
  3. Press CtrlA and CtrlD to "detach" the terminal to the created "screen".

  4. Now you should be able to close the terminal window without killing the mpi processes.
  5. In order to go back to the screen, open a new terminal and type screen -DR. It should open the last screen.
  6. In the screen, type exit, if you want to exit the screen.

Notation: If you created more than one screen, screen -DR shows a list with all screen sessions. Type screen -r [session number] to go to the screen or screen -X -S 63896 quit to quit the screen. It is a bit clumsy workaround, but I hope that helps, looking forward that this bug (or feature?) is resolved in future versions.

For further information refer to man screen.

DaveD
  • 121
  • Thanks; tmux is also a multiplexer like screen; but based on this, I prefered tmux. And neither nohup nor mpirun are buggy; the situation is due to over-riding the handler. – Naghi May 11 '20 at 10:11
1

A second way is to use setsid to run mpirun in a new session. The advantage is that this session is not killed when the terminal is closed (hang-up signal, SIGHUP), as suggested in general here and more specifically here. The syntax is simple:

setsid mpirun -np 100 icoFoam -parallel > log &

In order to terminate mpirun manually for any reason, kill one of the icofoam processes with htop, press F9 and send a SIGKILL by pressing 9. All other icofoam processes and the mpirun process should then be killed too. As an alternative, type killall mpirun as proposed here.

DaveD
  • 121