6.2 Processes on the Data-Link Layer As was mentioned in the beginning of this chapter, the data-link layer forms the connecting layer between drivers or network devices and the higher world of protocol instances. This section gives an overview of the processes on the data-link layer. We will explain what activity forms play an important role on this layer and how the transition between them occurs. Section 6.2.1 describes the process involved when a packet arrives, and Section 6.2.2 discusses how a packet is sent. First, however, we introduce the activity forms and their tasks in the Linux network architecture. Chapter 5). To ensure that the interrupt can terminate as quickly as possible (see Section 2.2.2), incoming data packets are put immediately into the incoming queue of the processing CPU, and the hardware interrupt is terminated. The software interrupt NET_RX_SOFTIRQ is marked for execution to handle these packets further.Chapters 7 through 25 run in the context of NET_RX soft-IRQ.Chapter 5 and Section 6.2.2.)The software interrupt NET_TX_SOFTIRQ (for short, NET_TX soft-IRQ) also sends data packets, but only provided that it was marked explicitly for this task. This case, among others, occurs when a packet cannot be sent immediately after it was put in the output queuefor example, because it has to be delayed for traffic shaping. In such a case, a timer is responsible for marking the NET_RX soft-IRQ for execution at the target transmission time (see Section 6.2.2) and transmitting the packet.This means that the NET_TX soft-IRQ can transmit packets in parallel with other activities in the kernel. It primarily assumes the transmission of packets that had to be delayed.Data packets to be sent by application processes are handled by system calls in the kernel. In the context of a system call, a packet is handled by the corresponding protocol instances until it is put into one of the output queues of the sending network device. As with NET_RX soft-IRQ, this activity tries to pass the next packet to the network adapter immediately after the previous one.Other activities of the kernel (tasklets, timer handling routines, etc.) do various tasks in the Linux network architecture. However, unlike the tasks of the activities described so far, they cannot be clearly classified, because they are activated by other activities upon demand. In general, these activity forms run tasks at a specific time (timer handling routines) or at a less specified, later time (tasklets).Application processes are not activities in the operating-system kernel. Nevertheless, we mentioned them here within the interplay of activities of the kernel, because some are started by system calls and because incoming packets are forwarded to some.
Chapter 5, will be discussed only superficially.
6.2.1 Receiving a Packet The path of each packet not generated locally in the computer begins in a network adapter or a comparable interface (e.g., the parallel port in PLIP). This port receives a packet and informs the kernel about its arrival by triggering an interrupt. The following process in the network driver was described in Chapter 5, but we will repeat it here briefly for the sake of completeness.If the transmission was correct, then the path of a packet through the kernel begins at this point (as in Figure 6-3). Up to when the interrupt was triggered, the Linux kernel had nothing to do with the packet. This means that the interrupt-handling routine is the first activity of the kernel that handles an incoming packet.
Section 5.3 (drivers/net/isa_skeleton.c), this is the method net_interrupt(). As soon as the interruption was identified as an incoming packet, net_rx() is responsible for further handling. If the interrupt was caused not by an incoming packet, but by a message that a data transmission was completed, then net_tx() continues.Chapter 4 and Section 5.3.) Subsequently, the pointer skb->dev is set to the receiving network device, and the type of the data contained in the layer-2 data frame is recognized. For this purpose, Ethernet drivers can use the method eth_type_trans(). There are similar methods for other MAC technologies (FDDI, token ring).Section 6.3.1.netif_rx() completes the interrupt handling. First, the current time is set in skb->time, and the socket buffer is placed in the input queue. As compared with earlier versions of the Linux kernel, there is now not only one single queue having the name "backlog"; instead, each CPU stores "its" incoming packets in the structure softnet_data[cpu].input_pkt_queue. This means that the processor that handles the interrupt always stores the packet in its queue. This mechanism was introduced to avoid kernel-wide locks of a single input queue.Once the packet was placed in the queue, the interrupt handling is complete. As was explained in Section 2.2.2, the handling routine of a hardware interrupt should run only the operations absolutely required to ensure that other activities of the computer (software interrupts, tasklets, processes) won't be unnecessarily interrupted.Incoming packets are further handled by the software interrupt (NET_RX_SOFTIRQ), which replaces the net bottom half (NET_BH) used in earlier versions of the Linux kernel. NET_RX_SOFTIRQ is marked for execution by __cpu_raise_softirq(cpu, NET_RX_SOFTIRQ). This mechanism is similar to bottom halfs, but the use of software interrupts allows much more parallelism and so makes possible improved performance on multiprocessor systems. (See Section 2.2.3.) The path of a packet initially ends in the queue for incoming packets. The interrupt handling was terminated, and the kernel continued handling the interrupted activity (process, software interrupt, tasklet, etc.). When the process scheduler (schedule() in kernel/sched.c) is invoked once more after a certain interval, then it first checks for whether a software interrupt is marked for execution. This is the case here, and it uses do_softirq() to start the marked soft-IRQ. The following section assumes that this concerns the NET_RX soft-IRQ:net_rx_action() | net/core/dev.c
| net_rx_action() is the handling routine of NET_RX_SOFTIRQ. In a continuous loop (for(;;){...}), packets are fetched one after the other from the input queue of the processing CPU and passed to the protocol-handling routine, until the input queue is empty. The continuous loop is also exited when the packet-handling duration exceeds one tick (10 ms) or when budget =net_dev_max_backlog packets have been removed and processed from the queue. This prevents the protocol-handling routine from blocking the remaining activities of the computer and thereby inhibits denial-of-service attacks.[1] net_dev_max_backlog specified the maximum length of the (only) input queue, backlog, in earlier versions of the Linux kernel, and was initialized with the value 300 (packets). In the new kernel versions, this is the maximum length of the input queues of the processors. The first action in the continuous loop is to request a packet from the input queue of the CPU by the method __skb_dequeue(). If a socket buffer is found, then the reference counter of the socket buffer is first incremented in skb_bond(). Subsequently, the socket buffer is transferred to instances of the handling protocols.First, the socket buffer is passed to all protocols registered in the list ptype_all. (See Section 6.3.) In general, no protocols are registered in this list. However, this interface is excellently suitable for inserting analytical tools.If the computer was configured as a bridge (CONFIG_BRIDGE) and the pointer br_handle_frame_hook() was set, then the packet is passed to the method handle_bridge(). It will then be processed in the bridge instance. (See Chapter 12.)The last action (which is generally the most common case) passed the socket buffer to all protocols registered with the protocol identifier (dev->protocol). They are managed in the hash table (ptype_base). Section 6.3 will explain the details of how layer-3 protocols are managed.For example, the method eth_type_trans() recognizes the protocol identifier 0x0800 and stores it in dev->protocol for an IP packet. In net_rx_action(), this identifier is now mapped by the hash function to the entry of the Internet Protocol (IP) instance. Handling of the protocol is started by a call of the corresponding protocol handling routine (func()). In the case of the Internet Protocol, this is the known method ip_rcv(). If other protocol instances are registered with the identifier 0x0800, then a pointer to the socket buffer is passed to all of these protocols one after the other.This means that the actual work with protocol instances of the Linux kernel begins at this point. In general, the protocols that start at this point are layer-3 protocols. However, this interface is also used by several other protocols that instead fit in the first two layers of the ISO/OSI basic reference model. The following section describes the inverse process (i.e., how a data packet is sent).
6.2.2 Transmitting a Packet As is shown in Figure 6-3, the process of transmitting a packet can be handled in several activity forms of the kernel. We distinguish two main transmission processes:Normal transmission process, where an activity tries to send off ready packets and send them over the network device immediately after the placing of a packet in the output queue of that network adapter. This means that the transmission process is executed either by NET_RX soft-IRQ or as a consequence of a system call. This form of transmitting packets is discussed in the following section.The second type of transmission is handled by NET_TX soft-IRQ. It is marked for execution by some activity of the kernel and invoked by the scheduler at the next possible time. The NET_TX soft-IRQ is normally used when packets are to be sent outside the regular transmission process or at a specific time for certain reasons. This transmission process is introduced after the section describing the normal transmission process.
The Normal Transmission Process
dev_queue_xmit() | net/core/dev.c
| dev_queue_xmit(skb) is used by the protocol instances of higher protocols to send a packet in the form of a socket buffer, skb, over a network device. The network device is specified by the parameter skb->dev of the socket buffer structure. (See Figure 6-4.)
Chapters 18 and 22.)Once the packet has been placed in the queue by the desired method (qdisc), further handling of packets ready to be sent is triggered. This task is handled by qdisc_run().There is one special case: that a network device has not defined methods for queue management (dev->enqueue == NULL). In this case, a packet is simply sent by dev->hard_start_xmit() right away. In general, this case concerns logical network devices, such as loopback, or tunnel network devices.qdisc_run() | include/net/pkt_sched.h
| qdisc_run(dev) has rather little functionality. All it actually does is call qdisc_restart() until it returns a value greater-equal null (no more packet in the queue), or until the network device does not accept any more packets (netif_queue_stopped(dev)).qdisc_restart() | net/sched/sch_generic.c
| qdisc_restart(dev) is responsible for getting the next packet from the queue of the network device and sending it. In general, the network device has only a single queue and works by the FIFO principle. However, it is possible to define several queues and serve them by a special strategy (qdisc).This means that dev->qdisc->dequeue() is used to request the next packet. If this request is successful, then this packet is sent by the driver method dev- >hard_start_xmit(). (See Chapter 5.) Of course, the method also checks on whether the network device is currently able to send packets (i.e., whether netif_queue_stopped(dev) == 0 is true).Another problem that can potentially occur in qdisc_restart() is that dev->xmit_lock sets a lock. This spinlock is normally set when the transmission of a packet is to be started in qdisc_restart(). At the same time, the number of the locking CPU is registered in dev->xmit_lock_owner.If this lock is set, then there are two options:The locking CPU is not identical with the one discussed here, which is currently trying to set the lock dev->xmit_lock. This means that another CPU sends another packet concurrently over this network device. This is actually not a major problem; it merely means that the other CPU was simply a little faster. The socket buffer is placed back into the queue (dev->adisc_>requeue()). Finally, NET_TX_SOFTIRQ is activated in netif_schedule() to trigger the transmission process again.If the locking CPU is identical with the CPU discussed here, then this means that a so-called dead loop is present: Forwarding of a packet to the network adapter was somehow interrupted in this processor, and an attempt was made to retransmit a packet. The response to this process is that the packet is dropped and everything returns immediately from qdisc_restart() to complete the first transmission process. The return value of qdisc_restart() can take either of the following values:= 0: The queue is empty.> 0: The queue is not empty, but the queue discipline (dev->qdisc) prevents any packet from being sent (e.g., because it has not yet reached its target transmission time in active traffic shaping).< 0: The queue is not empty, but the network device currently cannot accept more packets, because all transmit buffers are full. If the packet can be forwarded successfully to a network adapter, then the kernel assumes that this transmission process is completed, and the kernel turns to the next packet (qdisc_run()).
Transmitting over NET_TX Soft-IRQ The NET_TX_SOFTIRQ is an alternative for sending packets. It is marked for execution (___cpu_raise_softirq()) by the method netif_schedule().netif_schedule() is invoked whenever a socket buffer cannot be sent over the normal transmission process, described in the previous section. This problem can have several causes:Problems occurred when a packet was forwarded to the network adapter (e.g., no free buffer spaces).The socket buffer has to be sent later, to honor special handling of packets. In the case of traffic shaping, packets might have to be delayed artificially, to maintain a specific data rate. For this purpose, a timer is used, which starts the transmission of the packet when the transmission time is reached. (See Figure 6-4.) Now, if NET_TX_SOFTIRQ is marked for execution by netif_schedule(), it is started at the next call of the CPU scheduler.net_tx_action() | net/core/dev.c
| net_tx_action() is the handling routine of the NET_TX_SOFTIRQ software interrupt. The main task of this method is to call the method qdisc_restart() to start the transmission of the packets of a network device.The benefit of using the NET_TX_SOFTIRQ software interrupt is that processes can be handled in parallel in the Linux network architecture. In addition to NET_RX_SOFTIRQ, which is responsible for the main protocol handling, the NET_TX soft-IRQ can also be used to increase the throughput considerably in multiprocessor computers. |
|