Doug Mcllroy, described the concept of pipes long before they were implemented.
Don Knuth is the person who came up with the term computer science.
He wrote The Art of Computer Programming and created the Latex
language. Knuth enjoyed writing and programming so much that he
developed Literative Programming. Knuth created a style of
programming that allows you to write text about a program as you write
code. The idea is to write prose and programs simultaneously. However,
this idea did not take off because there was a large amount of overhead
associated with completing simple tasks like text parsing. From a
systems programming perspective using pipes, Mcllroy responded to
Knuth's work. He was able to complete the same text parsing task in 6
lines of shell code using pipes. While Knuth approached the problem from
an algorithms perspective, Mcllroy approached the problem from a systems
perspective, chaining intermediate outputs together to arrive at the
answer.
The seq program takes a number as an argument and prints consecutive
numbers starting at the first number. If a second argument is provided,
the numbers will stop printing at the once the second number has been
reached. Otherwise, the numbers will continue forever.
If we pipe the output of the seq program to less, the output
displayed on the screen is truncated because piping to less only
displays enough output to fill your screen. The seq program appears
to be paused. However, is it still running?
If we pipe seq to less and look at the list of running processes
using ps aux, we can see that the seq program is not running.
Using strace to further examine what is happing reveals that after a
series of write commands, there is a SIGPIPE signal. A
SIGPIPE occurs when there you are writing to a pipe with no readers.
The default action after a SIGPIPE is to kill the program. Pipes
automatically kill programs when their output is no longer needed. This
explains why the seq program is killed when it is piped to less.
Given that we don't care about &status, how can we use pipes to
create a blocking call that unblocks when the process dies? When the
child returns, we want the call to read to return 0 because all the
child will have exited and all write ends of the pipe will be closed.
Every process has a single parent. The root of the process hierarchy (or
process tree) is a process called init, which has pid 1. This is
the only process that cannot be killed. The waitpid retrieves a
process's exit status. The exit status of a process is stored in the
process structure until the parent process needs the status. Waitpid
collects the status and recycles the process structure. This means that
the process structure can be reused for another process.
The manyfork program tries to execute the fork instruction 10000
times. However, if we run ./manyfork, only ~3400 process have been
created. Running sudo ./manyfork, which gives the program more
privileges, results in ~6890 process created. The operating system is
protecting the user from runaway program. If we look at the processes
created by the manyfork program, we see that most of them are
defunct.
The manyfork program does not wait for its children using
waitpid. This will created what is called a zombie process. A
zombie process is a process that has been terminated but that has not
been waited upon by a parent. The ps command allows to identify
these zombie processes. Below is a sample output of ps after running
the manyfork program.
user 78623 0.0 0.0 0 0 pts/0 Z+ 15:44 0:00 [manyfork]
user 78624 0.0 0.0 0 0 pts/0 Z+ 15:44 0:00 [manyfork]
user 78625 0.0 0.0 0 0 pts/0 Z+ 15:44 0:00 [manyfork]
The Z+ column tells us that these processes are zombie processes.
These zombie processes are resources, namely process IDs. When a child
outlives its parent, the child's parent process is reassigned to the
init process with pid 1. The init process collects orphaned
children in this way. The job of init is to call waitpid on orphaned
children to collect their resources.