TCP/IP First-Step [Electronic resources] نسخه متنی

Tools of the Trade

To do all of the work necessary for communications to work predictably (notice I didn't say "reliably") across the Internet or other networks, TCP/IP must have a bunch of tools in its bag. Those tools aren't quite what you might expect; there are no hammers or screwdrivers or other recognizable implements. Instead, these tools are much more subtle and soft. They are little more than bits of data that get pasted onto the front of an application's data. This information is prepended, or added in front of the data, in a mechanism called a header. Generically speaking, the data in the header is called header information.

What makes these pieces of data useful and usable is that they adhere to conventions set forth by the Internet Engineering Task Force (IETF) and all TCP/IP protocol suites follow these conventions. Any two or more computers that speak TCP/IP follow these conventions and, therefore, can communicate with each other. The end result? It works!

Figuring out how it works requires looking at the headers that TCP, UDP, and IP use. The header structure that each different protocol uses consists of specific fields that enable different functions. To put things into perspective, please remember that most people don't even know when they are using TCP/IP. Consequently, taking a look at the various fields and mechanisms that enable TCP, UDP, and IP to work is really peeking deeply under the covers! This isn't stuff you'll need every day, but knowing how TCP/IP works will enable you to better appreciate it and may even help you become more adept at using networked communications.

TCP's Functional Requirements

As explained in Chapter 3, "The Quest for Freedom of Choice," TCP provides a reliable delivery mechanism. Reliable delivery means that your application's data will arrive safely at its intended destination. It might take a while to get there, but it will get there! TCP does a whole lot more than that, too.

Revisiting TCP's Functions

Just as a quick refresher, TCP (Transmission Control Protocol) occupies Layer 4the transport layerof the OSI reference model, as shown in Figure 5-1.

Figure 5-1. TCP Is a Transport, or Layer 4, Protocol Suite

TCP manages the flow of information between two applications, each of which resides on a different computer. Managing that flow of informationknown as a communications sessionrequires TCP to continuously track certain critical pieces of information. This is where the header information comes into play.

Before digging into the actual structure and composition of TCP's header, it's worthwhile to revisit its specific functions. TCP's top six functions include the following:

Accepting data from an application and chopping it into bite-sized chunks (known as segments) that will fit within an IP packet

Managing the communications session

Guaranteeing that data get delivered to the correct destination machine and application

Finding and fixing any damage that occurs to data when it traversed the network

Ensuring that any data lost in transit is replaced

Reassembling data received into a perfect copy of the application data that was sent

A subtle but important distinction lies hidden in this short list of TCP's core functions: Only the first item, segmenting the application data, does not require interaction between the source and destination computers. The remaining five functions absolutely require the TCP/IP protocols on both the source and destination machines to be communicating and interacting with each other. The only mechanism for this communication and interaction is the information stored in the header of each TCP segment. Now take a closer look at that header.

TCP's Header

The TCP protocol header is at least 20 octets in length. When you look at it in terms of the number of characters, the amount of information stored in TCP's header is remarkably smallonly about 20 characters. However, as I'll show you, TCP doesn't use these octets to carry readable words or characters. Trying to encapsulate words or phrases would be way too verbose and inefficient. Sometimes, all you really need is a pattern of 1s and 0s to get your point across.

bit is the smallest unit of data. The word is actually a compression of two different words: binary digit. Typically, 8 bits equals 1 byte. Byte is another compression of terms; it really means binary term. One byte can be thought of as one character or keystroke.destination machine. Taking that another step, the specific application on the source machine is known as the source application. Its counterpart on the destination machine is known as the destination application. Chapter 6, "Pushing the Envelope," you experience a typical TCP/IP communications session step by step.

The TCP header contains the following fields:

TCP Source Port
The first 16 bits (or 2 bytes, if you prefer that term) of a TCP header contain the source port field. This field is where TCP stores the address of the application that is making the call. An application's address is more properly called a port or port number. All by itself, a source port number is almost useless. However, when you use it in conjunction with a source IP address, you have all that you need for a return address! The source IP address gets replies and acknowledgments back to the source computer and the source port number gets those replies in the hands of the right application on that computer.

TCP Destination Port
The 16-bit destination port field is the address of the called (destination) port. The IP address is used to forward the packet to the correct destination machine. The TCP destination port is used to forward any received data to the correct application on that machine.

TCP Sequence Number
The receiving computer uses the 32-bit sequence number to reconstruct the fragmented data back into its original form. In a dynamic network like the Internet, it is quite possible for some of the packets to take different paths and, consequently, arrive out of order. Similarly, if a piece of data were lost or damaged in transit, you can guarantee that upon retransmission that piece of data will arrive much later than the piece sent before it. A sequencing field enables the destination machine to overcome this potential inconsistency and ensures that the data gets reconstructed into their original form.

TCPAcknowledgment Number
TCP uses a 32-bit acknowledgment of packets successfully received. If you look back at the TCP sequence number, you will notice that it, too, is a 32-bit number. That's not a coincidence! TCP uses that sequence number as the basis for its acknowledgments. That lets the source machine know which packet(s) have been received and acknowledged.

Data Offset
This 4-bit field contains the TCP header size measured in 32- bit "words." By knowing exactly how large the header is, the recipient knows how to find where the header ends and where the data actually begins.

Reserved
This 6-bit field is always set to 0. More precisely, each of those 6 bits is set to 0. It's not quite enough to equal a full character, so please don't misconstrue this as the binary value of the 0 character. It is actually quite common for standards bodies and technology manufacturers to leave room for future growth or feature enhancements. Having a 6-bit reserved field creates the possibility that TCP/IP can support new or different features in the future. We just don't know what they might be yet!

Flags
The 6-bit flag field contains six 1-bit flags that enable specific control functions. For example, if the last of these 6 bits is set to 1 (instead of its normal value of 0), the receiving machine understands that the sender has finished sending data. Some of the other flags enable the two machines to do esoteric functions like reset the connection between them or resynchronize sequence numbers.

Window Size
The destination machine uses this 16-bit field to tell the source host how much data it is willing to accept. The best way to think about this feature is as a traffic cop that regulates traffic flow between the source and destination machines. TCP guarantees data delivery. The only way it can do so is if it knows that any given piece of data made it safely to its destination. To do that, TCP must receive an acknowledgment from the recipient for each piece of data sent. Sending one acknowledgment for each piece of data can get onerous and inefficient. It, therefore, makes sense to batch them and send one acknowledgment for a bunch of packets that have been received. It is much more efficient to wait for 10, 20, or even 100 packets to be received satisfactorily before sending an acknowledgment.

Window size is the number of packets that a sending machine can send without an acknowledgment. There's a trade-off here: If the network is busy or having other problems, there is a strong probability that packets will get lost or damaged on their way to the destination. When that happens, a large window size will work against you! For that reason, the people who developed TCP/IP permitted a sliding window. TCP can sense when network conditions are deteriorating and can respond by telling the sender to reduce its window size. Similarly, when conditions are improving, TCP can signal the sender to increase its window size. In this manner, the window's size slides back and forth as TCP tries to find the optimal size at any given time.

Checksum
The TCP header contains a 16-bit error-checking field known as a checksum. The source host calculates a mathematical value based upon the segment's contents. This value gets stored in the header's checksum field, where the destination machine can examine it. The destination host performs the same calculation using the data it just received. If the packet's contents remained intact during its journey through the network, the result of the two calculations will be identical, thereby proving the validity of the data.

It is theoretically possible for data to be damaged such that the checksum operation would return the same value as expected if the data weren't damaged. However, the odds of this happening are so remote as to be utterly improbable. Thus, calculating a checksum is a marvelous way to determine whether a piece of data was damaged in transit, but it is not a perfect tool.

Padding
Padding implies fluff, or extraneous material, that's not needed. However, in the world of data communications, padding is useful for maintaining the timing and/or sizing requirements of a communications protocol. In the case of TCP, extra 0s are added in this field at the end of the TCP header to ensure that the TCP header is always some multiple of 32 bits. Remember, the data offset field tells the destination machine's TCP how many groups of 32-bits there were in the header. If, for some odd reason, the header ends up being something other than a multiple of 32 bits, 0s must be added at the end to make up the difference. These 0s are called padding.

Payload
Okay, so the next field isn't really part of the TCP header. It's the payload (data). This is where the application data is stored. Together, the header and payload comprise the TCP segment.

You might be wondering why the TCP header includes only the port addresses and not the computers' IP addresses. The answer is simple: TCP is application oriented while IP is network oriented. Consequently, TCP concerns itself with identifying applications' source and destination addresses, whereas IP focuses on the other half of that equationthe computers' source and destination IP addresses. Together, the port and IP addresses enable each piece of data to find its way through a network and let the replies find their own way back to the source machine and application. This ensures that communications are truly bidirectional.

UDP

UDP is the other Transport Layer Protocol suite in TCP/IP. That means that like TCP, UDP also occupies Layer 4 of the OSI reference model, as shown in Figure 5-2.

Figure 5-2. UDP Is a Transport, or Layer 4, Protocol Suite

However, UDP differs substantially from TCP in the way it works and the types of network performance it is designed to support. Its structure, too, is radically different and directly reflects its role in the TCP/IP protocol suite.

Revisiting UDP's Functions

If TCP and UDP were cars instead of communications protocols, TCP would be a Rolls Royceheavy and loaded with features but durable and highly reliable. The User Datagram Protocol (UDP), on the other hand, would be a stripped-down racing car that was built for just one purpose: speed! UDP makes no effort to do anything but process data as quickly as possible.

Great debate and misinformation surround the terms packet,datagram , and segment. Generally speaking, a packet is an IP structure, a datagram is a UDP structure, and a segment is a TCP structure.

Before we dig into the actual structure and composition of UDP's header, it's worthwhile to revisit UDP's specific top two functions:

Accepting data from an application and encapsulating it within a UDP header. The combined structure of data and header is known as a datagram. The datagram is handed to IP for further processing.

Checking to see whether data is received undamaged before it gets handed to its intended destination application.

That's it! No attempts are made to manage the communications session or negotiate for the retransmission of packets lost or damaged in transit. One fundamental difference between TCP and UDP is that applications that use TCP tend to transmit large quantities of data. UDP, on the other hand, typically receives small quantities of data. The data is usually in pieces small enough to make chopping it up or segmenting it unnecessary. Consequently, there is no need to put a sequence number in the header or worry about reassembly at the destination machine. The fact that UDP does not chop up application data into smaller pieces means that it is technically incorrect to identify UDP datagrams as segments. TCP segments data; UDP does not.

UDP's Header

One look at UDP's header composition shows you what I meant when I said this was a stripped-down racer built for speed. For example, the UDP header contains just 4 fields and is a mere 64 bits in total length. That's just 8 octets or bytes! In comparison, TCP's header contains 10 different fields and is a minimum of 20 octets or bytes long. That's quite a difference and it directly reflects UDP's streamlined architecture.

The UDP protocol header has the following structure:

UDP Source PortNumber
The first field in a UDP header contains the source application's 16-bit port number (or address, if you prefer that term). The source application is the one that started the conversation. It is imperative that UDP be able to uniquely identify the source application because UDP still must support a two-way conversation with the destination machine.

UDP Destination Port Number
The next field in the UDP header is another 16-bit application address. This address identifies the application on the destination computer to which the packet is addressed.

UDP Message Length
The 16-bit message length field informs the destination computer of the size of the message (payload) attached to the header. This field provides a useful way for the destination computer to see if the sent data was damaged during its journey through the network. If, for example, the message size is different than indicated in this field, the recipient can assume that it was damaged in transit. In the event that happens, UDP simply discards the data and moves on to the next packet without trying to negotiate a retransmission.

UDP Checksum
Just like TCP, UDP provides a mathematical mechanism for validating the data it is delivering. That mechanism is the checksum and it works in exactly the same way as TCP's checksum. Remember: UDP is designed for best-effort delivery of data. Its goal is the timely delivery of accurate data. Accurate data delivered late is discarded. The flip side of that is that inaccurate data delivered on time is equally worthless! For that reason, the destination computer performs the same mathematical function as the originating host. If there is a discrepancy in the two calculated values (that is, the value calculated by the destination machine and the value stored in this 16-bit field), it is safe to assume that an error has occurred during the transmission of the packet.

UDP Payload
For the sake of consistency, I'm including the UDP payload here even though it isn't really part of UDP's header. The payload is just the application data and the header is built to ensure that the data gets delivered appropriately through a network. The combination of UDP's header and its payload make up a single packet. UDP packets are also sometimes called datagrams.

Comparing TCP and UDP

The similarities and differences between TCP and UDP's header fields should be readily apparent. TCP is much more feature rich and the fields in its header directly support those features. UDP, on the other hand, is built for speed. UDP's header is the paragon of minimalism. It contains nothing that isn't absolutely essential to a timely delivery of data.

It is interesting to note that TCP's header contains almost all of the same fields that UDP's header includes. The fields that both havesource and destination ports and checksumshould be regarded as absolutely essential. Additionally, both have a mechanism for telling the recipient how large the packet should be (although TCP and UDP implement this capability in slightly different ways).

The differences between TCP and UDP are also readily apparent: TCP includes a lot more information in its header to support its various features, including flow control, sequencing, acknowledgments, and retransmission capabilities. These features enable TCP to provide a guaranteed delivery instead of just taking one hack at getting the data delivered. The penalty, of course, is lowered speed. That just reinforces what I've been telling you about UDP being a radically stripped-down protocol that's expressly built for speed!

IP

IP is one of those things that you use without really knowing that you are using it. Given that, it shouldn't be too much of a surprise to find that it does a whole lot of things that you probably have never even thought about. Peeling back the covers, though, helps you develop a much keener appreciation for exactly what IP does, why it does it, and how it gets the job done.

Simply stated, IP is responsible for carrying data through a network to its intended destination. Before peeling those covers back, you need to understand the most critical functions of the Internet Protocol. IP stands in stark contrast to TCP and UDP in that it occupies Layer 3the network layerof the OSI reference model, as shown in Figure 5-3.

Figure 5-3. IP Is a Network, or Layer 3, Protocol Suite

Revisiting IP's Functions

At the risk of oversimplifying an already underappreciated protocol suite, IP performs four critical functions:

Creating an envelope for carrying data through a network or internetwork.

Providing a numeric addressing system that lets you uniquely identify virtually every machine on the Internet around the world.

Enabling each envelope, or packet of data, to be specifically addressed to its intended destination. This is the packet's destination IP address.

Enabling each envelope, or packet of data, to also tell the recipient machine who sent it. This return address is the source IP address.

IP provides other functions, too. These will be quite evident when I show you the IP header's various fields. However, these are the functions I consider most critical. As you might have noticed from this short list, IP has a much different purpose than either TCP or UDP. In fact, it must support the network requirements of both. As such, it is not at all interested in information about applications. Instead, it fixates on information about the network.

IP's Header

To better explain what I mean about IP being focused on the network, take a quick look at its header. The IP header has the following size and structure:

Version
The first 4 bits of the IP header identify the specific version of IP that is being used. Only two versions4 and 6are worth talking about. IP version 4 (IPv4) is the current standard throughout the world today. IP version 6 (IPv6) is the next generation of IP. Although it is being used today, it has seen limited acceptance. Chances are extremely high that you are using an IPv4 network wherever and whenever you connect to an IP network.

Internet Header Length
The next 4 bits of the header contain the header's length. That length is expressed in multiples of 32, so a header that is 128 bits in length would be identified here with a value of 4 (4 x 32 = 128).

Type ofService
The next 8 bits contain 1-bit flags that can specify various attributes that can support a preferential treatment of specific packets. For example, a packet with a high time value could be given a priority through the use of the flags in this field. Each device in the network would look at this field and treat the packet accordingly.

Total Length
This 16-bit field contains the total length of the IP packet measured in octets, or groups of 8 bits.

Identifier
Each IP packet is given a unique, 16-bit identifier. This is much akin to a serial number and does not affect the operation of an IP packet as it travels through the network.

Flags
The next field contains another set of 1-bit flags. This set contains only three flags. These flags indicate whether it is possible to take the payload and chop it up into smaller pieces and whether that has already been done. This process is known as fragmentation. Either TCP or UDP usually handles the process of fragmenting your application's data into bite-sized pieces. However, IP also has this capability, as it may sometimes become necessary to further chop up a packet en route. This would be almost transparent to you as a user, although you might notice a little extra delay in getting things done through the network.

Fragment Offset
This 13-bit field measures the offset of the fragmented contents relative to the beginning of the entire datagram. This value is measured in 64-bit increments.

Time-To-Live (TTL)
The Internet is a busy place that supports millions of people and their communications needs. These aren't necessarily patient people, either! How long would you wait for a web page to download before hitting Stop? Five minutes? One hour? The point is that IP packets cannot be allowed to roam the network forever. You don't want to wait forever and it's not good for the network, either. Sooner or later you have to acknowledge that you can't reach a destination or complete a transaction. Rather than leave this decision solely at each user's discretion, IP contains a mathematical means of deciding when it's time to call it quits. That's this field: Time-to- Live. This 8-bit field keeps track of the number of network devices through which the IP packet passes. When a certain threshold is hit, the packet is discarded.

Protocol
This 8-bit field identifies the protocol that follows the IP header. Usually, this field identifies either TCP or UDP; however, IP can transport other protocols.

Checksum
The IP checksum field is 16 bits in length. By now you should be fairly familiar with the purpose of a checksum field. So familiar, in fact, that you're probably wondering why you would bother having two checksums on the same data! Because both TCP and UDP already perform this function on the application's data, what's the purpose of doing it all over again in IP? After all, the TCP and UDP packets are embedded inside the IP packet; you should be all set. The point is that the network is constructed of devices that speak IP but not necessarily TCP or UDP. Having a checksum embedded in the IP header enables these network devices to see if the packet is worth passing on or if it has become damaged.

Source IP Address
You have seen how both TCP and UDP headers contain application addresses. The IP address is 32 bits in length and responsible for keeping track of the machine's network address. This address is better known as the IP address. When a computer brands each IP packet with its own IP address, it is providing a way for the recipient, or destination machine, to send it replies.

Destination IP Address
The destination IP address field is also 32 bits in length and, as you can guess by now, contains the IP address of the computer to which this packet is being sent.

Padding
As was the case with the TCP header, IP can pad its header with extra 0s. The IP header must always be a multiple of 32 bits and IP can pad as necessary to meet that requirement.

Payload
Just like TCP and UDP, IP packets have a payload. The interesting twist is that an IP packet's payload is the complete TCP or UDP datagram. That is, the structures I showed you in the preceding sections (header and payload) would be treated as a payload by IP.

At first glance you might note some of similarities between TCP and IP. Both are, in fact, fairly heavy and feature-rich protocols. Some stark differences exist, however, the most significant of which is function. TCP is focused on applications, whereas IP is aimed squarely at the network. Looking a bit closer you'll notice that even though IP is geared toward the network, it contains nothing that would guide the packet through a network toward its destination except for the destination address. That implies the network is responsible for figuring out how to deliver each packet. A more subtle point is that this process is performed for each packet. That creates the possibility that each packet can take a slightly different route.

Now that you're quite familiar with TCP, UDP, and IP headers and packet structure, regress and take a closer look at application port addresses.