14.5. I/O MultiplexingWhen we read from one descriptor and write to another, we can use blocking I/O in a loop, such aswhile ((n = read(STDIN_FILENO, buf, BUFSIZ)) > 0) if (write(STDOUT_FILENO, buf, n) != n) err_sys("write error"); We see this form of blocking I/O over and over again. What if we have to read from two descriptors? In this case, we can't do a blocking read on either descriptor, as data may appear on one descriptor while we're blocked in a read on the other. A different technique is required to handle this case.Let's look at the structure of the telnet(1) command. In this program, we read from the terminal (standard input) and write to a network connection, and we read from the network connection and write to the terminal (standard output). At the other end of the network connection, the telnetd daemon reads what we typed and presents it to a shell as if we were logged in to the remote machine. The telnetd daemon sends any output generated by the commands we type back to us through the telnet command, to be displayed on our terminal. Figure 14.20 shows a picture of this. Figure 14.20. Overview of telnet program![]() Figure 14.21. The telnet program using two processes![]()
14.5.1. select and pselect FunctionsThe select function lets us do I/O multiplexing under all POSIX-compatible platforms. The arguments we pass to select tell the kernel
On the return from select, the kernel tells us
[View full width]#include <sys/select.h> int select(int maxfdp1 , fd_set *restrict readfds , fd_set *restrict writefds , fd_set ![]() struct timeval *restrict tvptr ); | ||||||||||||||||||||||||||||||
Returns: count of ready descriptors, 0 on timeout, 1 on error |
long tv_sec; /* seconds */
long tv_usec; /* and microseconds */
};
There are three conditions.tvptr == NULL
Wait forever. This infinite wait can be interrupted if we catch a signal. Return is made when one of the specified descriptors is ready or when a signal is caught. If a signal is caught, select returns 1 with errno set to EINTR.tvptr->tv_sec == 0 && tvptr->tv_usec == 0
Don't wait at all. All the specified descriptors are tested, and return is made immediately. This is a way to poll the system to find out the status of multiple descriptors, without blocking in the select function.tvptr->tv_sec != 0 || tvptr->tv_usec != 0
Wait the specified number of seconds and microseconds. Return is made when one of the specified descriptors is ready or when the timeout value expires. If the timeout expires before any of the descriptors is ready, the return value is 0. (If the system doesn't provide microsecond resolution, the tvptr>tv_usec value is rounded up to the nearest supported value.) As with the first condition, this wait can also be interrupted by a caught signal.POSIX.1 allows an implementation to modify the timeval structure, so after select returns, you can't rely on the structure containing the same values it did before calling select. FreeBSD 5.2.1, Mac OS X 10.3, and Solaris 9 all leave the structure unchanged, but Linux 2.4.22 will update it with the time remaining if select returns before the timeout value expires.The middle three argumentsreadfds , writefds , and exceptfds are pointers to descriptor sets . These three sets specify which descriptors we're interested in and for which conditions (readable, writable, or an exception condition). A descriptor set is stored in an fd_set data type. This data type is chosen by the implementation so that it can hold one bit for each possible descriptor. We can consider it to be just a big array of bits, as shown in Figure 14.23.
Figure 14.23. Specifying the read, write, and exception descriptors for select

#include <sys/select.h> int FD_ISSET(int fd , fd_set *fdset ); |
Returns: nonzero if fd is in set, 0 otherwise |
void FD_CLR(int fd , fd_set *fdset ); void FD_SET(int fd , fd_set *fdset ); void FD_ZERO(fd_set *fdset ); |
int fd;
FD_ZERO(&rset);
FD_SET(fd, &rset);
FD_SET(STDIN_FILENO, &rset);
On return from select, we can test whether a given bit in the set is still on using FD_ISSET:if (FD_ISSET(fd, &rset)) {
...
}
Section 10.19 that sleep waits for an integral number of seconds. With select, we can wait for intervals less than 1 second; the actual resolution depends on the system's clock.) Exercise 14.6 shows such a function.The first argument to select, maxfdp1 , stands for "maximum file descriptor plus 1." We calculate the highest descriptor that we're interested in, considering all three of the descriptor sets, add 1, and that's the first argument. We could just set the first argument to FD_SETSIZE, a constant in <sys/select.h> that specifies the maximum number of descriptors (often 1,024), but this value is too large for most applications. Indeed, most applications probably use between 3 and 10 descriptors. (Some applications need many more descriptors, but these UNIX programs are atypical.) By specifying the highest descriptor that we're interested in, we can prevent the kernel from going through hundreds of unused bits in the three descriptor sets, looking for bits that are turned on.As an example, Figure 14.24 shows what two descriptor sets look like if we writefd_set readset, writeset;
FD_ZERO(&readset);
FD_ZERO(&writeset);
FD_SET(0, &readset);
FD_SET(3, &readset);
FD_SET(1, &writeset);
FD_SET(2, &writeset);
select(4, &readset, &writeset, NULL, NULL);
Figure 14.24. Example descriptor sets for select

It is important to realize that whether a descriptor is blocking or not doesn't affect whether select blocks. That is, if we have a nonblocking descriptor that we want to read from and we call select with a timeout value of 5 seconds, select will block for up to 5 seconds. Similarly, if we specify an infinite timeout, select blocks until data is ready for the descriptor or until a signal is caught.If we encounter the end of file on a descriptor, that descriptor is considered readable by select. We then call read and it returns 0, the way to signify end of file on UNIX systems. (Many people incorrectly assume that select indicates an exception condition on a descriptor when the end of file is reached.)POSIX.1 also defines a variant of select called pselect.[View full width]#include <sys/select.h>
int pselect(int maxfdp1 , fd_set *restrict readfds ,
fd_set *restrict writefds , fd_set

const struct timespec *restrict tsptr ,
const sigset_t *restrict sigmask );
- The timeout value for select is specified by a timeval structure, but for pselect, a timespec structure is used. (Recall the definition of the timespec structure in Section 11.6.) Instead of seconds and microseconds, the timespec structure represents the timeout value in seconds and nanoseconds. This provides a higher-resolution timeout if the platform supports that fine a level of granularity.
- The timeout value for pselect is declared const, and we are guaranteed that its value will not change as a result of calling pselect.
- An optional signal mask argument is available with pselect. If sigmask is null, pselect behaves as select does with respect to signals. Otherwise, sigmask points to a signal mask that is atomically installed when pselect is called. On return, the previous signal mask is restored.
14.5.2. poll Function
The poll function is similar to select, but the programmer interface is different. As we'll see, poll is tied to the STREAMS system, since it originated with System V, although we are able to use it with any type of file descriptor.[View full width]#include <poll.h>
int poll(struct pollfd fdarray [], nfds_t nfds , int

int fd; /* file descriptor to check, or <0 to ignore */
short events; /* events of interest on fd */
short revents; /* events that occurred on fd */
};
The number of elements in the fdarray array is specified by nfds .Historically, there have been differences in how the nfds parameter was declared. SVR3 specified the number of elements in the array as an unsigned long, which seems excessive. In the SVR4 manual [AT&T 1990d], the prototype for poll showed the data type of the second argument as size_t. (Recall the primitive system data types, Figure 2.20.) But the actual prototype in the <poll.h> header still showed the second argument as an unsigned long. The Single UNIX Specification defines the new type nfds_t to allow the implementation to select the appropriate type and hide the details from applications. Note that this type has to be large enough to hold an integer, since the return value represents the number of entries in the array with satisfied events.The SVID corresponding to SVR4 [AT&T 1989] showed the first argument to poll as struct pollfd fdarray [], whereas the SVR4 manual page [AT&T 1990d] showed this argument as struct pollfd *fdarray . In the C language, both declarations are equivalent. We use the first declaration to reiterate that fdarray points to an array of structures and not a pointer to a single structure.To tell the kernel what events we're interested in for each descriptor, we have to set the events member of each array element to one or more of the values in Figure 14.25. On return, the revents member is set by the kernel, specifying which events have occurred for each descriptor. (Note that poll doesn't change the events member. This differs from select, which modifies its arguments to indicate what is ready.)
Name | Input to events? | Result from revents? | Description |
---|---|---|---|
POLLIN | • | • | Data other than high priority can be read without blocking (equivalent to POLLRDNORM|POLLRDBAND). |
POLLRDNORM | • | • | Normal data (priority band 0) can be read without blocking. |
POLLRDBAND | • | • | Data from a nonzero priority band can be read without blocking. |
POLLPRI | • | • | High-priority data can be read without blocking. |
POLLOUT | • | • | Normal data can be written without blocking. |
POLLWRNORM | • | • | Same as POLLOUT. |
POLLWRBAND | • | • | Data for a nonzero priority band can be written without blocking. |
POLLERR | • | An error has occurred. | |
POLLHUP | • | A hangup has occurred. | |
POLLNVAL | • | The descriptor does not reference an open file. |
Wait forever. (Some systems define the constant INFTIM in <stropts.h> as 1.) We return when one of the specified descriptors is ready or when a signal is caught. If a signal is caught, poll returns 1 with errno set to EINTR.timeout == 0
Don't wait. All the specified descriptors are tested, and we return immediately. This is a way to poll the system to find out the status of multiple descriptors, without blocking in the call to poll.timeout > 0
Wait timeout milliseconds. We return when one of the specified descriptors is ready or when the timeout expires. If the timeout expires before any of the descriptors is ready, the return value is 0. (If your system doesn't provide millisecond resolution, timeout is rounded up to the nearest supported value.)It is important to realize the difference between an end of file and a hangup. If we're entering data from the terminal and type the end-of-file character, POLLIN is turned on so we can read the end-of-file indication (read returns 0). POLLHUP is not turned on in revents. If we're reading from a modem and the telephone line is hung up, we'll receive the POLLHUP notification.As with select, whether a descriptor is blocking or not doesn't affect whether poll blocks.
Interruptibility of select and poll
When the automatic restarting of interrupted system calls was introduced with 4.2BSD (Section 10.5), the select function was never restarted. This characteristic continues with most systems even if the SA_RESTART option is specified. But under SVR4, if SA_RESTART was specified, even select and poll were automatically restarted. To prevent this from catching us when we port software to systems derived from SVR4, we'll always use the signal_intr function (Figure 10.19) if the signal could interrupt a call to select or poll.None of the implementations described in this book restart poll or select when a signal is received, even if the SA_RESTART flag is used.