My understanding of pipe is, all piped commands are invoked in parallel and stdout of each command is fed to next command as stdin. When processing large files, initial parts of data may be done with processing, while some parts of the data are still in the earlier stages of the pipeline. Is this a correct picture of what happens?
Then, what happens when using a command (e.g sort) that needs all of its input at once, rather than working on it line by line? Will it work in small chunks and pass it forward or will it wait until previous command is done passing all data? If it does wait, how is the waiting data handled? Is it stored in RAM? Does pipe have an upper limit for the size of the data?
sort
needs the whole file before it can produce output, it can write partial data to temporary files, since it already knows what kind of sort it's doing. – waltinator Oct 21 '17 at 14:04sort
, and I guess also some other tools have 'their own swap system' writing to temporary files? And tools with this abiility will not fill RAM and swap and fail? They will continue to work (but slowly) until the hard disk drive is full? – sudodus Oct 21 '17 at 14:14sort
has "buffer files" that it uses to keep from filling up its virtual address space, and knows nothing about "swapping". "Swapping" is done at the system level, by fiddling withsort
's virtual address space, While thesort
process is not running. – waltinator Oct 24 '17 at 17:51