TL;DR: Because this is the optimal method for creating new processes and keeping control in interactive shell
fork() is necessary for processes and pipes
To answer the specific part of this question, if grep blabla foo
were to be called via exec()
directly in parent, parent would seize to exist, and its PID with all the resources would be taken over by grep blabla foo
.
However, let's talk in general about exec()
and fork()
. The key reason for such behavior is because fork()/exec()
is the standard method of creating a new process on Unix/Linux, and this isn't a bash specific thing; this method has been in place since the beginning and influenced by this same method from already existing operating systems of the time. To somewhat paraphrase goldilocks's answer on a related question, fork()
for creating new process is easier since the kernel has less work to do as far as allocating resources goes, and a lot of the properties ( such as file descriptors, environment, etc) - all can be inherited from the parent process ( in this case from bash
).
Secondly, as far as interactive shells go, you can't run an external command without forking. To launch an executable which lives on disk (for example, /bin/df -h
), you have to call one of exec()
family functions, such as execve()
, which will replace the parent with the new process, take over its PID and existing file descriptors,etc. For interactive shell, you want the control to return to the user and let the parent interactive shell carry on. Thus, the best way is to create a subprocess via fork()
, and let that process be taken over via execve()
. So interactive shell PID 1156 would spawn a child via fork()
with PID 1157, then call execve("/bin/df",["df","-h"],&environment)
, which makes /bin/df -h
run with PID 1157. Now the shell only has to wait for process to exit and return control to it.
In case where you have to create a pipe between two or more commands, say df | grep
, you need a way to create two file descriptors (that's read and write end of pipe which come from pipe()
syscall), then somehow let two new processes inherit them. That's done forking new process and then by copying the write end of the pipe via dup2()
call onto its stdout
aka fd 1 (so if write end is fd 4, we do dup2(4,1)
). When exec()
to spawn df
happens the child process will think nothing of its stdout
and write to it without being aware (unless it actively checks) that its output actually goes a pipe. Same process happens to grep
, except we fork()
, take read end of pipe with fd 3 and dup(3,0)
before spawning grep
with exec()
. All this time parent process is still there, waiting to regain control once pipeline done complete.
In case of built-in commands,generally shell doesn't fork()
, with exception of source
command. Subshells require fork()
.
In short, this is a necessary and useful mechanism.
Disadvantages of forking and optimizations
Now, this is different for non-interactive shells, such as bash -c '<simple command>'
. Despite fork()/exec()
being optimal method where you have to process many commands, it's a waste of resources when you have only one single command. To quote Stéphane Chazelas from this post:
Forking is expensive, in CPU time, memory, allocated file descriptors... Having a shell process lying about just waiting for another process before exiting is just a waste of resources. Also, it makes it difficult to correctly report the exit status of the separate process that would execute the command (for instance, when the process is killed).
Therefore, many shells ( not just bash
) use exec()
to allow that bash -c ''
to be taken over by that single simple command. And exactly for the reasons stated above, minimizing pipelines in shell scripts is better. Often you can see beginners do something like this:
cat /etc/passwd | cut -d ':' -f 6 | grep '/home'
Of course, this will fork()
3 processes. This is a simple example, but consider a large file, in range of Gigabytes. It'd be far more efficient with one process:
awk -F':' '$6~"/home"{print $6}' /etc/passwd
Waste of resources actually can be a form of Denial of Service attack, and in particular fork bombs are created via shell functions that call themselves in pipeline, which forks multiple copies of themselves. Nowadays, this is mitigated via limiting maximum number of processes in cgroups on systemd, which Ubuntu also uses since version 15.04.
Of course that doesn't mean forking is just bad. It's still a useful mechanism as discussed before, but in case where you can get away with less processes, and consecutively less resources and thus better performance, then you should avoid fork()
if possible.
See also
exec grep blabla foo
. Of course, in this particular case, it won't be very useful (since your terminal window will just close as soon as the grep finishes), but it can be occasionally handy (e.g. if you're starting another shell, perhaps via ssh / sudo / screen, and don't intend to return to the original one, or if the shell process you're running this on is a sub-shell that's never meant to execute more than one command anyway). – Ilmari Karonen Mar 02 '14 at 19:50bash -c 'grep foo bar'
and calling exec there is form of optimization bash does for you automatically – Sergiy Kolodyazhnyy Sep 23 '18 at 21:23