5.10 wait and waitpid Functions
In Figure 5.7, we called the wait function to handle the terminated child.
#include <sys/wait.h> |
pid_t wait (int *statloc ); |
pid_t waitpid (pid_t pid , int *statloc , int options ); |
Both return: process ID if OK, 0 or1 on error |
Difference between wait and waitpid
We now illustrate the difference between the wait and waitpid functions when used to clean up terminated children. To do this, we modify our TCP client as shown in Figure 5.9. The client establishes five connections with the server and then uses only the first one (sockfd[0]) in the call to str_cli. The purpose of establishing multiple connections is to spawn multiple children from the concurrent server, as shown in Figure 5.8.
Figure 5.8. Client with five established connections to same concurrent server.

Figure 5.9 TCP client that establishes five connections with server.
tcpcliserv/tcpcli04.c
1 #include "unp.h"
2 int
3 main (int argc, char **argv)
4 {
5 int i, sockfd[5];
6 struct sockaddr_in servaddr;
7 if (argc != 2)
8 err_quit ("usage: tcpcli <IPaddress>";
9 for (i = 0; i < 5; i++) {
10 sockfd[i] = Socket (AF_INET, SOCK_STREAM, 0);
11 bzero (&servaddr, sizeof (servaddr));
12 servaddr.sin_family = AF_INET;
13 servaddr.sin_port = htons (SERV_PORT);
14 Inet_pton (AF_INET, argv[1], &servaddr.sin_addr);
15 Connect (sockfd[i], (SA *) &servaddr, sizeof (servaddr));
16 }
17 str_cli (stdin, sockfd[0]); /* do it all */
18 exit(0);
19 }
When the client terminates, all open descriptors are closed automatically by the kernel (we do not call close, only exit), and all five connections are terminated at about the same time. This causes five FINs to be sent, one on each connection, which in turn causes all five server children to terminate at about the same time. This causes five SIGCHLD signals to be delivered to the parent at about the same time, which we show in Figure 5.10.
Figure 5.10. Client terminates, closing all five connections, terminating all five children.

It is this delivery of multiple occurrences of the same signal that causes the problem we are about to see.We first run the server in the background and then our new client. Our server is Figure 5.2, modified to call signal to establish Figure 5.7 as a signal handler for SIGCHLD.
linux % tcpserv03 & | |
[1] 20419 | |
linux % tcpcli04 127.0.0.1 | |
hello | we type this |
hello | and it is echoed |
^D | we then type our EOF character |
child 20426 terminated | output by server |
PID TTY TIME CMD
20419 pts/6 00:00:00 tcpserv03
20421 pts/6 00:00:00 tcpserv03 <defunct>
20422 pts/6 00:00:00 tcpserv03 <defunct>
20423 pts/6 00:00:00 tcpserv03 <defunct>
Establishing a signal handler and calling wait from that handler are insufficient for preventing zombies. The problem is that all five signals are generated before the signal handler is executed, and the signal handler is executed only one time because Unix signals are normally not queued . Furthermore, this problem is nondeterministic. In the example we just ran, with the client and server on the same host, the signal handler is executed once, leaving four zombies. But if we run the client and server on different hosts, the signal handler is normally executed two times: once as a result of the first signal being generated, and since the other four signals occur while the signal handler is executing, the handler is called only one more time. This leaves three zombies. But sometimes, probably dependent on the timing of the FINs arriving at the server host, the signal handler is executed three or even four times.The correct solution is to call waitpid instead of wait. Figure 5.7, we cannot call wait in a loop, because there is no way to prevent wait from blocking if there are running children that have not yet terminated.Figure 5.12 shows the final version of our server. It correctly handles a return of EINTR from accept and it establishes a signal handler (Figure 5.11) that calls waitpid for all terminated children.
Figure 5.11 Final (correct) version of sig_chld function that calls waitpid.
tcpcliserv/sigchldwaitpid.c
1 #include "unp.h"
2 void
3 sig_chld(int signo)
4 {
5 pid_t pid;
6 int stat;
7 while ( (pid = waitpid(-1, &stat, WNOHANG)) > 0)
8 printf("child %d terminated\n", pid);
9 return;
10 }
Figure 5.12 Final (correct) version of TCP server that handles an error of EINTR from accept.
tcpcliserv/tcpserv04.c
1 #include "unp.h"
2 int
3 main(int argc, char **argv)
4 {
5 int listenfd, connfd;
6 pid_t childpid;
7 socklen_t clilen;
8 struct sockaddr_in cliaddr, servaddr;
9 void sig_chld(int);
10 listenfd = Socket (AF_INET, SOCK_STREAM, 0);
11 bzero (&servaddr, sizeof(servaddr));
12 servaddr.sin_family = AF_INET;
13 servaddr.sin_addr.s_addr = htonl(INADDR_ANY);
14 servaddr.sin_port = htons(SERV_PORT);
15 Bind(listenfd, (SA *) &servaddr, sizeof(servaddr));
16 Listen(listenfd, LISTENQ);
17 Signal (SIGCHLD, sig_chld); /* must call waitpid() */
18 for ( ; ; ) {
19 clilen = sizeof(cliaddr);
20 if ( (connfd = accept (listenfd, (SA *) &cliaddr, &clilen)) < 0) {
21 if (errno == EINTR)
22 continue; /* back to for() */
23 else
24 err_sys("accept error");
25 }
26 if ( (childpid = Fork()) == 0) { /* child process */
27 Close(listenfd); /* close listening socket */
28 str_echo(connfd); /* process the request */
29 exit(0);
30 }
31 Close (connfd); /* parent closes connected socket */
32 }
33 }
The purpose of this section has been to demonstrate three scenarios that we can encounter with network programming:
- We must catch the SIGCHLD signal when forking child processes.
- We must handle interrupted system calls when we catch signals.
- A SIGCHLD handler must be coded correctly using waitpid to prevent any zombies from being left around.
The final version of our TCP server (Figure 5.12), along with the SIGCHLD handler in Figure 5.11, handles all three scenarios.