Understanding pipe (for large files)

Question

My understanding of pipe is, all piped commands are invoked in parallel and stdout of each command is fed to next command as stdin. When processing large files, initial parts of data may be done with processing, while some parts of the data are still in the earlier stages of the pipeline. Is this a correct picture of what happens?

Then, what happens when using a command (e.g sort) that needs all of its input at once, rather than working on it line by line? Will it work in small chunks and pass it forward or will it wait until previous command is done passing all data? If it does wait, how is the waiting data handled? Is it stored in RAM? Does pipe have an upper limit for the size of the data?

Bear in mind that neither the piped-from procerss, nor the piped-to process has ANY knowledge of pipes - they just write STDOUT and read STDIN. While sort needs the whole file before it can produce output, it can write partial data to temporary files, since it already knows what kind of sort it's doing. — waltinator, Oct 21 '17 at 14:04
@waltinator, does this mean that sort, and I guess also some other tools have 'their own swap system' writing to temporary files? And tools with this abiility will not fill RAM and swap and fail? They will continue to work (but slowly) until the hard disk drive is full? — sudodus, Oct 21 '17 at 14:14
@sudodus, sort has "buffer files" that it uses to keep from filling up its virtual address space, and knows nothing about "swapping". "Swapping" is done at the system level, by fiddling with sort's virtual address space, While the sort process is not running. — waltinator, Oct 24 '17 at 17:51

Understanding pipe (for large files)

0 Answers0