Multiprocess per single core

Question

I have 6 cores and want to run 12 processes in the same time in parallel .. I'm using mpirun but sucked and confused in the option that I will use for multiprocess per core .. the command of 6 processes like that

mpirun -np 6 ./the program

And what is the maximum number of processes which can be used in a single core

It might help if you wrote WHAT you are trying to run in parallel. Please add that to your question. Things like "make" jobs can easily be parallelized; for other tasks it depends what you want to do. — HuHa, Mar 07 '21 at 19:44
That your application is gromacs is really important information and should have been included in your question. That being said, your question is off topic here. You should refer to the gromacs documentation for running in parallel. I'll delete my answer shortly. — Doug Smythies, Mar 08 '21 at 14:13
i think the question is a general question not specific for gromacs but you asked for example! .. my question is what is the maximum number of processes that can be used in a single core as I mentioned in the post and also the title of the post — sam, Mar 08 '21 at 19:57

HuHa · Answer 1 · 2021-03-09T19:54:47.153

The Linux kernel can start and manage a lot of processes for you; more than you can reasonably use. But the more important question is: How many parallel processes are useful for your specific task? How many should you start to make the best possible use of your system resources?

That is not easy to answer in the general case. Some tasks are CPU bound, i.e. they need a lot of computing power. Some are I/O bound, i.e. they read a lot of data from disk or write a lot of data to disk. Some are network bound, i.e. they transfer a lot of data over the network. And then there is memory usage; each task needs a certain amount of RAM, be it physical or virtual RAM, including swap space; and when swapping starts, everything comes to a screeching halt, so you will want to avoid that.

So you have several different classes of system resources that each task needs, i.e. for which any parallel tasks compete:

CPU
RAM
Disk I/O
Network I/O

If the overall mission is to run, say, 10,000 tasks, it depends on their usage pattern of those system resources how many of them make sense to be started in parallel. If each task is not very CPU intensive, but has to wait for results from the network, it might make sense to run considerably more of them in parallel than you have CPU cores. If they all read a lot of data from files, it might be more efficient not to run that many in parallel because disk I/O will be the limiting factor. If they all read the same file it might be the opposite because the file will be cached in I/O buffers already.

It really depends on the usage pattern, and you will typically have to experiment where the sweet spot is for your specific system configuration (depending on I/O bandwidth, number of CPU cores, CPU usage, available RAM etc.).

your answer is very good and well written. upvoted. I might delete mine again. — Doug Smythies, Mar 09 '21 at 02:21

Doug Smythies · Answer 2 · 2021-03-08T22:08:24.067

This answer is only about the how many threads on one core part.

It depends greatly on the load and memory per thread. The examples herein, use minimal memory and load per thread, and you can observe I get to huge numbers of threads per core.

In this example, a program will be launched 30 times, with each occurrence spinning out 1000 threads. First, I'll look at how many processes are already related to me:

~$ cat /sys/fs/cgroup/pids/user.slice/user-1000.slice/pids.current
14

Then launch the program multiple times, in the background:

doug@s18:~/c$ for i in {1..30}; do ./waiter 1000 100 60 2 999999 0 > /dev/null & done
[1] 635246
[2] 635247
[3] 635248
...
[28] 635426
[29] 635434
[30] 635448

And look again at the number of threads:

doug@s18:~$ cat /sys/fs/cgroup/pids/user.slice/user-1000.slice/pids.current
30044

This is on a 6 core, 6 CPU, i5-9600K processor.

Now, as to the limits for threads, see this answer, but also I'll show it here without the supporting information that you can find in the link. I'll do 20 times at 10,000 threads spun out per, for a total of 200,000 threads:

doug@s18:~$ cat /sys/fs/cgroup/pids/user.slice/user-1000.slice/pids.current
200034

O.K. I did not do this only on one core, so now do so:

doug@s18:~/c$ for i in {1..20}; do taskset -c 4 ./waiter 10000 100 60 2 999999 0 > /dev/null & done
[1] 85334
[2] 85335
...
[19] 85353
[20] 85354

No problem, and actually I could get to about 254000 (by mistake, if you must know) before the load average jumped to over 200000 and things really bogged down.

top - 13:53:58 up 9 min,  2 users,  load average: 100.06, 2488.11, 2014.59
Tasks: 200179 total,   8 running, 200171 sleeping,   0 stopped,   0 zombie
%Cpu0  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  : 18.1 us, 25.7 sy,  0.0 ni, 56.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu4  : 25.7 us, 38.7 sy,  0.0 ni, 35.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu5  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  31962.2 total,   6236.9 free,  24814.8 used,    910.5 buff/cache
MiB Swap:   2048.0 total,   2048.0 free,      0.0 used.   6679.1 avail Mem
PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND

1150 doug      20   0  241796 231484   3132 R  43.0   0.7   2:42.53 top
     21 root      20   0       0      0      0 S   0.6   0.0   0:01.94 ksoftirqd/1
      1 root      20   0  167628  11468   8356 S   0.0   0.0   0:54.61 systemd

The high load average is from the launching phase, and is dropping rapidly. Observe there is still plenty of capacity on CPU 4, but memory is getting low, but not bad actually.

However, systemd seems to get into difficulty trying to kill so many processes on one core:

top - 14:06:48 up 22 min,  2 users,  load average: 143735.08, 78365.08, 32282.72
Tasks: 163872 total, 106906 running, 19503 sleeping,   0 stopped, 37463 zombie
%Cpu0  :  0.2 us, 59.4 sy,  0.0 ni, 40.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  :  0.0 us, 30.5 sy,  0.0 ni, 69.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  : 12.3 us, 26.4 sy,  0.0 ni, 60.9 id,  0.0 wa,  0.0 hi,  0.4 si,  0.0 st
%Cpu3  :  0.6 us,  4.2 sy,  0.0 ni, 95.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu4  :  0.0 us,100.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu5  :  0.4 us,  1.9 sy,  0.0 ni, 97.5 id,  0.0 wa,  0.0 hi,  0.2 si,  0.0 st
MiB Mem :  31962.2 total,  13418.8 free,  17632.6 used,    910.8 buff/cache
MiB Swap:   2048.0 total,   2048.0 free,      0.0 used.  13861.3 avail Mem
PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  1 root      20   0  167628  11476   8356 S  99.6   0.0   4:08.50 systemd

285365 doug      20   0  241772 231688   3256 R  36.1   0.7   1:28.26 top

Things will sort themselves out, if I leave it long enough.

The program used herein is mine, but started/evolved from the book "Linux Programing BY EXAMPLE" Que December 1999.

Multiprocess per single core

2 Answers2