Definitive MPLS Network Designs [Electronic resources] نسخه متنی

Core Network Availability

This section discusses how you can improve network availability by means of various network recovery mechanisms, which allow for the rerouting of the affected traffic along an alternate path.

Network recovery mechanisms are available at various layers (optical, SONET/SDH, MPLS/IP, and so on). A detailed reference on this topic is [NET-RECOV]. However, this section focuses on the recovery techniques provided by IP routing and MPLS Traffic Engineering.

When electing a particular network recovery mechanism (or sometimes a set of mechanisms), you should first determine the overall objectives in terms of network availability. Such objectives are usually driven by the application requirements. For instance, the requirement in terms of availability for ATM traffic carried over MPLS is undoubtedly significantly more stringent than for the Internet traffic. Moreover, even for a specific application such as VoIP, the requirements may vary from one service provider to another. Indeed, consider a network carrying VoIP traffic. If the objective is to ensure that a voice call will not be dropped in case of a network element failure, a convergence time of several seconds (usually about 2 seconds, although such a value greatly depends on the tolerance of the VoIP gateway's signaling protocol) is perfectly acceptable. On the other hand, if the service provider wants to offer VoIP service that can tolerate any network element failure without any noticeable degradation for the user, the recovery mechanism must be able to reroute the traffic within a few tens of milliseconds along a backup path offering stringent QoS guarantees. This highlights the fact that the objectives must first be clearly established so as to come up with the most appropriate network recovery design.

After you have determined the network recovery objectives, you should evaluate the various available network recovery mechanisms while keeping in mind various criteria. The convergence time is obviously the first one that comes to mind. How much time is required to reroute the traffic upon a network element failure? This is clearly a critical aspect to consider, but certainly not the only one. The following are a few other important aspects:

The scope of recovery Determines whether the recovery mechanism can handle link failure, Shared Risk Link Group (SRLG) failure, or node failure. As a reminder, the concept of SRLG relates to the notion of multiple links sharing a common resource whose failure would provoke all the links to fail simultaneously. For example, consider the case of multiple links routed across the same fiber. A fiber cut would imply the failure of all the links. We say that they share the same SRLG.

QoS during failure Does the alternate path (usually called a backup path) provide an equivalent QoS?

Network overhead This relates to the number of extra state required in the network by the recovery mechanism.

Cost There is a very wide range of network recovery mechanisms whose costs vary by several orders of magnitude. For instance, an optical 1+1 protection mechanism, although very efficient in terms of rerouting time and QoS along the backup path, requires additional equipment for the traffic replication and some extra bandwidth dedicated to protection (actually as much bandwidth as the protected bandwidth). On the other hand, other recovery mechanisms such as IP are usually cheaper both in terms of extra equipment and network resources (of course, they may not provide the same level of performance).

Network stability Does the network recovery under some circumstances (such as when very fast recovery is required) potentially lead to network instability when faced with frequent path changes?

This list is not meant to be exhaustive. It simply provides some of the most important aspects to evaluate when considering a network recovery mechanism. These different aspects are illustrated throughout this book in the various case studies.

In some cases, it might be desirable to elect to use a combination of recovery mechanisms, although such an option requires some extra care to avoid race conditions between recovery mechanisms and double protection, which may lead to a lack of optimality in terms of backup resources. For instance, a common approach consists of combining SONET protection with fine-tuning Interior Gateway Protocol (IGP) rerouting.

Protection Versus Restoration

There are actually two families of network recovery mechanismsprotection and restoration. A protection mechanism relies on the precomputation (and signaling) of a backup (alternate) path before any failure. In contrast, a restoration mechanism requires the computation of the backup path only after the failure has occurred; in other words, the backup path is computed on the fly. In terms of convergence time, protection mechanisms are usually faster but also require more network overhead because extra state is required.

Local Versus Global Recovery

This other characteristic of a network recovery mechanism relates to the location of the node in charge of redirecting the traffic along a backup path. A recovery mechanism is said to be local when the node immediately upstream of the failure is responsible for rerouting the traffic should a failure occur. Conversely, with a global recovery mechanism, such as the default rerouting mode of MPLS TE, described in this section, the headend of the affected TE LSP is in charge of the rerouting.

You will see various network recovery mechanisms: IP routing (restoration), MPLS TE headend reroute (global restoration), MPLS TE path protection (global protection), and MPLS TE Fast Reroute (local protection).

Before describing the recovery mechanisms involved in IP routing and MPLS Traffic Engineering, it is worth introducing the notion of a recovery cycle. It can be used to describe the different steps involved in any network recovery procedure (see Figure 2-22).

Figure 2-22. The Recovery Cycle

[View full size image]

The first phase of the recovery process is failure detection. Its duration is a function of various parameters such as the Layer 1 or 2 protocol in use, the keepalive mechanisms (when required in some environments), and so on. You will see various illustrations of the mechanisms involved and typical failure detection times in the various cases studies. Typically, the failure detection time can range from a few milliseconds to several seconds or even tens of seconds in some cases.

As soon as the fault is detected, it might be desirable to wait for a certain amount of time (the hold-off time) before triggering any action. For instance, this can be the case when IP routing is used in conjunction with optical protection. As soon as the fault is detected, IP waits for a configurable period of time to maximize the chance that the optical layer will recover from the fault. If the fault has not been cleared after the hold-off period, the recovery mechanism is activated.

The next phase is called fault notification. It may take some time to signal the fault to the node that can reroute the traffic onto an alternate path. In the example of local recovery, the fault notification time is, of course, null. Conversely, with a global recovery mechanism, the fault notification has to be propagated upstream of the failure to the node responsible for the rerouting, which may take a nonnegligible amount of time.

The two final steps are related to the traffic recovery itself and depend on the recovery mechanism in use. In a nutshell, they refer to the actual operation of rerouting the traffic.

Network Recovery with IP Routing

Over the years, several routing protocols have been designed. Two major families of routing protocols exist: distance vector protocols (such as Routing Information Protocol [RIP] and EIGRP) and link-state routing protocols (such as Open Shortest Path First [OSPF] and Intermediate System-to-Intermediate System [IS-IS]).

This section focuses on link-state protocols such as OSPF and IS-IS because they have been deployed in most service provider networks because of their superiority in terms of scalability, optimality, and convergence properties.

Link-state routing protocols rely on the concept of a Link-State Database (LSDB)a collection of Protocol Data Units (PDUs) called Link-State Advertisements in OSPF (see [OSPFv2]) and Link-State Packets in IS-IS (see [ISIS])that describes some part of the overall network topology and IP address reachability. (This section uses the generic term LSA for both OSPF and IS-IS.) Each router is responsible for originating one or more LSAs (depending on whether we refer to IS-IS or OSPF) and the collection of all the LSAs originated by each router constitutes the LSDB. In contrast to the distance vector protocols, each router running a link-state routing protocol has a complete view of the network topology through its LSDB. Then an algorithm known as the Dijkstra algorithm allows for the computation of the Shortest Path Tree (SPT), according to a specific metric, from the computing router (the tree root) to every reachable node in the network. Finally, based on this SPT, each node builds its routing table, which contains the shortest path to each reachable IP prefix in the network.

A crucial aspect of link-state routing is of course to guarantee the synchronization of all the routers' LSDBs within a routing domain. This is of the utmost importance to avoid routing loops (because routers with different views of the network topology may make inconsistent routing decisions, leading to a routing loop). Such a lack of synchronization between LSDBs can occur during transient states for a temporary period, as illustrated later in this section.

Although OSPF and IS-IS differ in many respects, they are quite similar as far as the recovery aspects are concerned. Thus, this section applies equally to OSPF and IS-IS.

Let's start with the situation during steady state. As shown in Figure 2-23, in steady state, all the routers have an identical LSDB. The Hello protocol is used between neighboring routers to check that each neighbor is "alive" by means of short messages exchanged at regular intervals (we usually say that a router maintains routing adjacencies).

Figure 2-23. Link-State Routing Protocols

[View full size image]

Given the steady state, we can describe the major steps of the IP routing recovery cycle upon failure. For the sake of illustration, consider the case of a failure of the link R4R5, as shown in Figure 2-24.

Figure 2-24. Mode of Operation of IP Routing Upon Failure

[View full size image]

At time t0, a failure occurs (for example, the link R4R5). As with any recovery mechanism, the first phase is detecting the failure. As already pointed out, the fault detection time is a function of many factors, such as the underlying Layer 1 or 2 protocol, the tuning of the Hello message frequency, and so on.

A router's detection of the failure (R4 and R5 in this example) triggers the creation of a new LSA reflecting the topology change. However, it may be useful to delay the origination of this LSA for propagation to other routers (the notion of hold-off timer, described earlier in this section). The delay between the triggering of an LSA and the origination of the LSA is usually a dynamic function in modern router implementations. Later you will see the set of mechanisms used to control the LSA origination frequency and why such a dynamic timer scheme is so useful. When the LSA origination timer expires (at time t1 on R4 and time t2 on R5 in this example), the LSA is flooded throughout the network such that each router can reflect the topology change in its LSDB and recompute its routing table.

Suppose that R3 receives R4's LSA at time t3. R3 first checks whether the received LSA is "new" (whether it reflects a network topology change or just an LSA refresh). If the LSA signals a topology change, a new routing table calculation is triggered (usually called an SPF computation). Similar to the case of the LSA origination, there is a timer between an SPF and its execution. Indeed, in the case of multiple simultaneous failures (such as a node failure, which can result in the failure of many of its connected links), a node receives multiple new LSAs (for example, from every neighbor of the failing resource). Hence, waiting for some time before computing the SPF may be a good idea so as to get the most accurate new topology before computing the new routing table. The reason why it is desirable to have a dynamic timer will become clear through the various case studies.

Note

The timing shown in this example is given for the sake of illustration. Different sequences of events could occur. (For instance, the router R8 could receive R5's LSA before R3 receives R4's LSA.)

For a link failure, although two LSAs are systematically originated, one by each end of the failed link, the receipt of one LSA is sufficient to exclude that link for SPF computation. Indeed, before considering a link in its SPT, a router always checks that the link is announced by both ends of the link (this is called a two-way connectivity check).

The LSA flooding process is reliable. The receipt of an LSA must always be acknowledged. The nonreceipt of an acknowledgment triggers the resending of the LSA.

Upon receipt of R4's LSA, R3 recomputes a new routing table so as to reroute the traffic accordingly. For instance, the traffic from R1 to R5 in this case is rerouted along the path R3R10R11R5. Consider a slightly different network topology without any link between R3 and R10. In such a situation, the first node that can reroute the traffic from R1 to R5 is R2. This highlights the importance of R4's LSA being quickly flooded by R3 (before computing its own SPF) with minimal delay, because the rerouting node is now several hops upstream from the failure. Thus, speeding up the LSA propagation is of the utmost importance so as to increase the convergence time. This involves trying to limit the processing delay on each node by means of various router internal mechanisms. This provides the appropriate priority to the LSA flooding process and potentially the use of QoS mechanisms to reduce the queuing delays experienced by LSA packets.

As soon as a new LSA or set of new LSAs is received, each router starts an SPF computation, which is actually made up of two components: the SPT computation and the routing table update (where the shortest path for each prefix is computed). Of course, there are no simple rules for computing the SPF duration because it depends on many factors, such as the network size and topology, the number of IP prefixes in the network, and so on. But to give an order of magnitude, an optimized SPT computation rarely exceeds a few tens of milliseconds in a large network with hundreds of routers. On the other hand, the routing table computation is usually on the order of several hundreds of milliseconds for networks with a few thousand IP prefixes. An interesting optimization of SPF called Incremental SPF (iSPF) allows for a drastic reduction of the SPT and routing table computation time in many failure circumstances by selectively recomputing the SPT only in the parts of the topology where it is necessary.

The final step consists of updating the Forwarding Information Base (FIB) residing on the line cards in the case of a distributed routing architecture. Note that this last step may require some time (which could be on the order of a few hundreds of milliseconds in the case of large FIBs).

Use of Dynamic Timers for LSA Origination and SPF Triggering

As previously mentioned, dynamic timers are used to control both the origination of an LSA and the triggering of SPF computation. Any recovery mechanism has the goal of trying to provide fast restoration of a network element affected by a failure that requires fast reaction to such failure events. In the example of IP routing, you saw that a router detecting a loss of a routing adjacency provoked by a link or neighbor node failure originates a new LSA reflecting the network topology change. Such an LSA is flooded throughout the network, and every router receiving this new LSA consequently triggers a new routing table calculation. To decrease the overall convergence time, it is desirable for every router connected to the failed resource to quickly originate a new LSA and for every router receiving such a new LSA to quickly trigger an SPF computation. But this requires some caution so as to protect the network from unstable network elements. Consider the case of a "flapping" link. If a new LSA is quickly originated at each link state change, this would unavoidably result in frequent IGP LSA updates and SPF triggering on every router in the network, potentially leading to network instability. Thus, the solution to this problem is to use a dynamic timer to quickly react to simple failures but also dampen the LSA origination and SPF triggering if frequent network state changes occur in the network. The algorithm used on a Cisco router to dynamically compute such a timer is based on exponential back-off with three parameters used by the router. (Example 2-1 is given for the IS-IS LSP origination but applies equally to the SPF triggering.)

Example 2-1. Exponential Back-Off


!
router isis
lsp-gen A B C

Parameter B in Example 2-1 specifies in milliseconds how long the router detecting the loss of adjacency for the first time waits before originating a new LSP. If a second state change occurs, the router waits for C milliseconds. If a third state change happens, the router waits for 2 * C, and then 4 * C, and so on up to a maximum of A seconds. At this stage, the delay between the origination of two LSPs is A seconds if the link keeps flapping. Then if the link stays in a stable state for 2 * A seconds, the router returns to the original behavior. This is illustrated in Figure 2-25.

Figure 2-25. Exponential Back-Off Algorithm for LSA Origination

[View full size image]

A similar algorithm is used for SPF triggering.

Such an exponential back-off algorithm allows for quick reaction while preserving network stability in case of unstable network elements. Several examples of parameter setting for both LSA origination and SPF triggering are provided throughout this book.

Computing the Convergence Time with IP Routing

You have seen the different steps that occur during the IP recovery process. As soon as the failure is detected, each neighboring router of the failed resource originates a new LSA after a timer has elapsed. Each LSA is reliably flooded throughout the network. Finally, each node receiving a new LSA (reflecting a topology change) triggers an SPF computation after another timer has also elapsed. Consequently, the total convergence time depends on quite a long list of factors: the routing timer settings, the network topology and number of prefixes, and so on. Several examples are provided in various case studies, but we'll give you an order of magnitude, with some careful design rules. Rerouting times on the order of 1 second can be achieved in very large networks but require some nonnegligible engineering work. Also, you should keep in mind two important aspects inherent in IP routing:

Lack of predictability All the routers of course eventually converge, but the exact event timing is hard to predict.

Transient routing loops Because the flooding of a new LSA takes some time, at some point the various routers in the network may have unsynchronized LSDBs. Hence, having different views of the network topology, they may make routing decisions leading to loops. That being said, these loops are temporary and are cleared as soon as all the routers converge.

As explained earlier, it may be desirable to wait for a certain period of time before flooding an LSA or computing an SPF. For instance, when flooding a new LSA, it may be desirable to wait for some time to expire in case another lower-layer recovery mechanism can restore the failed resource. When computing an SPF, in case of a node failure, several LSAs are originated. Hence, waiting before computing the SPF increases your chances of getting an accurate LSDB before computing a new routing table.

To protect the network from instability caused by a flapping network resource, a dynamic timer is desirable. The case of an unstable link is a good example. Without a dynamic LSA origination timer, both R4 and R5 would constantly originate new LSAs that would in turn generate some potentially nonnegligible routing control updates and would also trigger new routing table computations on each nodewhich, of course, is highly undesirable. Hence, a back-off mechanism has been designed to quickly react (originate the new LSA) when a link first fails and then slow down the LSA origination if the link flaps. The algorithm (available on Cisco routers) used by the back-off mechanism has three parameters: T1, T2, and T3. T1 specifies how long a router that has detected a link failure (more precisely, a loss of routing adjacency) waits before originating a new LSA. If a second state change occurs, the router waits for T2 before originating a new LSA. If the link keeps flapping, the period between successive LSA originations doubles at each change, up to a maximum value of T3. If the link remains in a stable state for 2 * T3, the router reverts to the original behavior. Such an algorithm allows for fast reaction upon single failure while protecting the network in case of unstable resources. A similar algorithm can be used for the SPF calculation, which also provides an efficient mechanism for fast convergence while protecting the router from some misbehaving router(s) or some major network instability conditions.

A common misperception is that IGPs converge in tens of seconds. This section has shown that in reality this can be reduced to 1 to 2 seconds with appropriate tuning. Furthermore, IP routing inherently provides backup bandwidth sharing. Indeed, no resource is reserved beforehand should a resource fail. Hence, the available bandwidth can be used to reroute any traffic upon failure. On the other hand, subsecond rerouting time is much more difficult to achieve. Other network recovery mechanisms are probably more suitable for such requirements. Moreover, guaranteeing equivalent QoS in case of network failure is also quite challenging.

Network Recovery with MPLS Traffic Engineering

MPLS Traffic Engineering provides a full spectrum of network recovery mechanisms. We will first review the default recovery mode (called MPLS TE reroute) based on global restoration, and then path protection (global protection), and finally MPLS Fast Reroute (local protection). Each mechanism differs in its ability to meet various recovery requirements such as rerouting times, scope of recovery, ability to provide equivalent QoS during failure, required amount of extra state, and so on. Depending on its requirements, a service provider then can elect the appropriate MPLS TE recovery mechanism.

MPLS TE Reroute

MPLS TE reroute, the default mode of network recovery of MPLS Traffic Engineering, is a global restoration mechanism:

Global The node in charge of rerouting a TE LSP affected by a network element failure is the headend router.

Restoration When the headend router is notified of the failure, a new path is dynamically computed, and the TE LSP is signaled along the new alternate path (assuming one can be found). For the sake of exhaustiveness, it is also possible to precompute or preconfigure an alternate path. Be aware that before any failure occurs, you should determine a path that is fully diverse from the active one, because you won't know about a failure beforehand.

Consider the example shown in Figure 2-26. A TE LSP T1 is initially set up along the path R1R2R3R4R5. The link R3R4 fails. After a period of time (the fault detection time), the router R3 (and the router R4) detects the failure. Again, this period of time essentially depends on the failure type and the Layer 1 or 2 protocol. If you assume a Packet over SONET (PoS) interface, the fault failure detection time is usually on the order of a few milliseconds. In the absence of a hold-off timer, the router upstream of the failure immediately sends the failure notification (RSVP-TE Path Error message) to the headend router (R1 in this example).

Figure 2-26. MPLS Traffic Engineering Reroute (Global Restoration)

[View full size image]

As soon as R1 is notified of the failure experienced by T1, it immediately triggers the computation of a new path (CSPF) for T1. (Even if the IGP LSA update has not been received yet, R1 prunes the link R3R4 from its local traffic engineering database before recomputing a new path.) After the new path is computed, the new TE LSP is signaled along the newly computed shortest path obeying the set of constraints (R1R2R7R8R4R5 in this example).

Accurately quantifying the time required to perform the set of operations just described is particularly difficult because of the many variables involved. These include the network topology (and hence the number of nodes the failure notification and the new LSP signaling messages have to go through and the propagation times of those through fiber), the number of TE LSPs affected by the failure, CPU processor on the routers, and so on. That being said, we can provide an order of magnitude. On a significantly large and loaded network, the CSPF time and RSVP-TE processing time per node are usually a few milliseconds. Then the propagation delay must be taken into account in the failure notification time as well as in the signaling time. So, on a continental network, MPLS TE headend rerouting would be on the order of hundreds of milliseconds.

MPLS TE Reroute is undoubtedly the simplest MPLS TE recovery mechanism because it does not require any specific configuration and minimizes the required amount of backup state in the network. The downside is that its rerouting time is not as fast and predictable as the other MPLS TE recovery techniques that are discussed next. Indeed, the fault first has to be signaled to the headend router, followed by a path computation and the signaling of a new TE LSP along another path, if any (thus with some risks that no backup path can be found, or at least with equivalent constraints).

MPLS TE Path Protection

Another network recovery mechanism available with MPLS TE is path protection. The principle is to precompute and presignal a TE LSP used as a backup in case the primary TE LSP is affected by a network element failure. The backup LSP path can be dynamically computed by the headend (by CSPF) or by means of an offline tool.

Consider the network shown in Figure 2-27. The backup LSP has to be diverse (which means it should not use any of the same facilities, such as links, as the protected TE LSP) because the fault location by definition is unknown beforehand. Multiple schemes offer various degrees of diversity and thus protect against different scopes of failure. Figure 2-27 shows an example of a backup path offering link diversity and a backup path offering node diversity.

Figure 2-27. MPLS Traffic Engineering Path (Global Protection)

[View full size image]

Compared to MPLS TE reroute, path protection is obviously faster because it does not require computation of an alternate TE LSP path and resignaling of the new TE LSP before starting to reroute the traffic along the backup path. On the other hand, such a network recovery suffers from a lack of scalability because it requires doubling the number of TE LSPs in the network. Hence, creating a full mesh of TE LSPs between 50 routers with path protection requires the configuration of 4900 TE LSPs instead of 2450. Moreover, if some guarantees are required in terms of bandwidth along the backup paths, this implies extra bandwidth reservation, which cannot be used for other primary TE LSPs. Finally, the constraint of end-to-end path diversity may lead to following some nonoptimal paths in case of network failure.

MPLS TE Fast Reroute

MPLS TE Fast Reroute, a local protection mechanism, is by far the most widely deployed MPLS TE recovery mechanism. It relies on the presignaling of backup tunnels at each node, which are used to locally reroute all the TE LSPs affected by a network failure. To protect a facility such as a link, SRLG, or node, the set of relevant routers must be configured. The set of required backup tunnels may be configured manually (in which case their paths are statically configured) or by means of automatic mechanisms (details of such mechanisms are seen in several case studies).

Consider the network shown in Figure 2-28. At each hop, a backup tunnel is configured that follows a diverse path from the protected facility. (In this case, the facility is links, but you will see that Fast Reroute can also be used to protect against SRLG and node failure.) Figure 2-28 illustrates the use of Fast Reroute to protect the link R2R3.

Figure 2-28. MPLS Traffic Engineering Fast Reroute (Link Protection)

[View full size image]

Mode of Operation Before Failure

We will now review the steps required to use MPLS TE Fast Reroute before any failure occurs.

As shown in Figure 2-28, a backup tunnel B1 from R2 to R3 is signaled before any failure. It protects all the TE LSPs that cross the link R2R3 (such as T1 and T2). At this point, it is worth mentioning that the eligibility of a TE LSP to benefit from Fast Reroute along its path can be configured on a per-TE LSP basis and can be explicitly signaled in RSVP-TE. (Indeed, it may be desirable to protect only a selected subset of TE LSPs by means of Fast Reroute, based on various availability requirements.)

In this example, two TE LSPs traverse the link R2R3. We will focus on the TE LSP T1. T1 is signaled along the path R1R2R3R4R5. (The corresponding labels distributed by RSVP-TE [Resv messages] are shown in Figure 2-28.) When the T1 LSP is first signaled, each LSR along the path determines whether the TE LSP is asking to be protected by Fast Reroute. If T1 is signaled with such a property, every router tries to select a backup tunnel to protect the TE LSP should a link fail. In the case of T1, R2 (also called a Point of Local Repair [PLR]) selects B1 as the backup tunnel to be used in case of failure of the link R2R3. Each PLR selects a backup tunnel that meets the requirement to intersect the protected TE LSP on some downstream node (called the Merge Point [MP])R3 in this example. RSVP-TE signaling lets you specify more parameters related to Fast Reroute, such as any requirement for bandwidth protection (in other words, whether a backup tunnel offering an equivalent QoS is required). We will revisit this aspect later.

Mode of Operation During and After Failure

Upon a link failure (the link R2R3 in Reoptimization of a Traffic Engineering LSP."

As soon as local rerouting occurs, the PLR must refresh the set of rerouted TE LSP onto the backup tunnel. Indeed, RSVP-TE is a soft-state protocol; hence, the state of each TE LSP must be regularly refreshed. You already saw that refreshes are performed by sending RSVP-TE Path and Resv messages (downstream and upstream, respectively). This also applies to a rerouted TE LSP. Consequently, for each TE LSP rerouted onto a backup tunnel, the PLR must keep refreshing the TE LSP state by sending Path messages to the downstream neighbor onto the backup tunnel. (Note that those RSVP-TE Path messages are label-switched from the PLR to the MP, so they are not seen by any intermediate node along the backup path. This explains why the rerouted TE LSPs do not create any additional state along the backup path.) It is also important to mention that in some cases the headend router may not be able to reroute the affected TE LSP. In this case, the rerouted TE LSP stays in this mode until a reoptimization can occur. This highlights the need to refresh the state of such locally rerouted TE LSPs. Several additional details are related to the signaling procedures when Fast Reroute is triggered; they are described in [FRR].

But why should the state be refreshed downstream if its respective headend router will eventually quickly reoptimize the TE LSPs? There are two reasons. First, depending on the RSVP-TE timer setting and the event sequence timing, the state for the rerouted TE LSP may time out before the headend router has had time to effectively reoptimize the affected TE LSP end to end. Another reason might be the impossibility for the headend router to find an alternate path obeying the set of required constraints. In such a case, an implementation should maintain the rerouted TE LSP in its current state (along the back tunnel) to avoid TE LSP failure, until a more optimal path can be found.

The situation as soon as Fast Reroute has been triggered (also called during failure) is shown in Figure 2-29.

Figure 2-29. Mode of Operation for MPLS Traffic Engineering (Link Protection)

[View full size image]

The Fast Reroute mode of operation has been described using the example of link protection, but the use of Fast Reroute is not limited to link protection. It also protects TE LSPs from node failure. This is illustrated in Figure 2-30.

Figure 2-30. MPLS Traffic Engineering Fast Reroute (Node Protection)

[View full size image]

The objective now is to protect a TE LSP from a node failure. In the example, the solution is to presignal a backup tunnel bypassing the node to protect (also called the next-next hop [NNHOP] backup tunnel). Therefore, you would signal a backup tunnel B2 from the PLR (R2) to the MP (which is the PLR's next-next hopR4 in our example). As soon as the PLR detects a node failure, the exact same principle is applied, and the affected TE LSPs (that follow the path R2R3R4) are rerouted onto B2.

One additional issue must be solved. In the regular RSVP-TE mode of operation, the labels are distributed in the upstream direction (from the tail-end router to the headend router) on a hop-by-hop basis. For instance, R4 binds the label 3 to R3, R3 distributes the label 2 to R2, and R2 distributes the label 1 to R1. Note that R2 does not know (and, at steady state, R2 does not need to know) which label is used between R3 and R4. It only requires the label expected by R3. You saw in the previous example that when Fast Reroute is triggered, the PLR first performs the regular label-swap operation. So applying the same set of rules, the PLR (R2) would perform a label-swap operation (1 to 2) and then push the label 20, which would be swapped along B2's path (2021, 2122) before a PHP is performed by the backup tunnel penultimate hop, R8. But then R4 receives label 2 even though it expects to receive label 3. This shows that in case of an NNHOP backup tunnel, the PLR must in fact swap the incoming label to the label value that the MP expects (3 in this example).

This is achieved by means of RSVP-TE extensions that allow the various crossed routers along a TE LSP path to be recorded, along with the corresponding allocated labels. Thanks to this extension, which uses the RSVP-TE Route Record Object (RRO), the PLR can unambiguously determine the label expected by the MP. Therefore, it can trigger the appropriate label-swapping operation before pushing the backup label and redirecting the rerouted TE LSP onto the appropriate outgoing interface.

The complete sequence of operations in case of Fast Reroute used for node protection is shown in Figure 2-31.

Figure 2-31. Mode of Operation for MPLS Traffic Engineering Fast Reroute Node Protection

[View full size image]

It is interesting to note how the PLR detects a node failure as opposed to a link failure. This is actually quite a difficult problem, because when a node power supply fails, for instance, its attached links also fail. So how does the PLR tell a link failure from a node failure? It can't (although some mechanisms [not yet available] have been proposed). Consequently, to be on the safe side, the PLR always makes the assumption of a node failure if both next-hop and next-next-hop backup tunnels are configured.

Number of NNHOP Backup Tunnels Required by Fast Reroute Backup

The minimum number of backup tunnels required on a given PLR to protect a node equals the number of next-next hops. Indeed, going back to our example, three backup tunnels are required on R2 to protect all the TE LSPs against a node failure of R3:

An NNHOP backup tunnel starting on R2 and terminating on R4 to protect the TE LSPs that follow the path R2R3R4

An NNHOP backup tunnel from R2 to R7 to protect the TE LSP following the path R2R3R7

An NNHOP backup tunnel from R2 to R10 to protect the TE LSPs following the path R2R3R10

We mentioned earlier that MPLS TE Fast Reroute can also protect against SRLG failure. MPLS TE Fast Reroute can be used to protect TE LSPs from SRLG failure by simply taking into account the SRLG membership (flooded by the IGP) of each link when computing the backup tunnel's path.

Backup Tunnel Path Computation

A multitude of path computation algorithms can compute the backup tunnel paths that can either be distributed or centralized. As already pointed out, the set of objectives largely dictates the algorithm complexity.

For example, the requirement can be to just find a path disjoint from the facility (such as link, SRLG, or node) to protect. In this case, the algorithm is quite straightforward. On the other hand, if additional constraints, such as bandwidth protection, bounded propagation delay increase, and so on, must also be satisfied, this leads to a usually nonlinear increase in algorithm complexity. In this category, algorithm efficiency is usually measured in terms of required backup capacity, among other criteria.

Indeed, one of the objectives of any recovery mechanism is to minimize the required number of resources dedicated to backup. If some guarantees are required along the backup path, in terms of bandwidth, delays, and so on, this implies the reservation by the backup tunnel of network resources such as bandwidth. With respect to such guarantees, a relatively common objective is to protect against single network element failure. (Note that in the case of an SRLG, such a network element can itself be composed of multiple links. This might be the case when several optical lambdas are multiplexed over a single fiber by means of technology such as Dense Wavelength Division Multiplexing [DWDM] and thus are all part of the same SRLG.) When such an assumption is considered acceptable (usually referred to as the single failure assumption), you can make an interesting observation. If two backup tunnels protect independent resources, because it is assumed that those two resources will not fail simultaneously, this also means that the two backup tunnels will never be active at the same time. Hence, the required amount of backup bandwidth is not the sum of their bandwidth on every link they share, but simply the largest of their respective bandwidth. Support for this concept of bandwidth sharing by the backup path computation algorithm allows for a significant reduction in the amount of required backup bandwidth under the single-failure assumption.

Backup Tunnel Load Balancing

This refers to the ability to presignal more than one backup tunnel between a PLR and an MP when protecting a single network element (that is, a set of four next-hop backup tunnels to protect a single link). Why? There could be several reasons for such a design, but the most common one is the inability to find a path for a single backup tunnel that satisfies the necessary constraints. For instance, consider the case of an OC-192 link for which the operator requires full bandwidth protection. In other words, a backup tunnel must be computed that provides an equivalent QoS (which can usually be reduced to the constraint of finding a path offering equivalent bandwidth). It just might not be possible to find such a path.

One solution would be to signal multiple backup tunnels (for instance, four 2.5-Gbps backup TE tunnels) such that the sum of their bandwidth equals the bandwidth of the protected facility (for example, an OC-192 link). Hence, a PLR has several backup tunnels to protect a given facility. As soon as a TE LSP requesting protection is signaled, the PLR selects one backup tunnel from among the set of candidates (the set of backup tunnels that satisfy the TE LSP requirements). It is important to highlight that for each TE LSP, a single backup tunnel is selected (this is why the term load balancing requires some clarification). You don't use multiple backup tunnels to protect a single TE LSP upon failure to avoid packet reordering. The packets of a single flow would follow different paths. This could lead to packet reordering, especially if the backup tunnel's paths have different characteristics, such as bandwidth and/or propagation delay. Sophisticated algorithms are needed on the PLR to perform an efficient backup tunnel selection to tackle the usual well-known challenges. One challenge is packing problems (for a set of TE LSPs, how to choose a set of backup tunnels to satisfy the maximum number of requests). Another challenge is smart mapping (for example, map different sets of primary TE LSPs with different attributes onto different backup tunnels with the corresponding property).

This might be seen as a simple and elegant solution, but it comes at the cost of some additional overall complexity and requires configuration of more backup tunnels.

Revertive Versus Nonrevertive

When is a newly restored link reused? There are actually two situations to consider.

First is the case of TE LSPs locally rerouted by Fast Reroute. If the link is restored before its headend router reoptimizes the TE LSP, you could envision letting the PLR revert the TE LSP to the original path. However, such an approach (also called local reversion) has several drawbacks. But in case of a flapping link, it would result in constantly switching the traffic from one path to another, which would lead to recurring traffic disruptions if no dampening algorithm were used. Hence, the Fast Reroute specification (see [FRR]) recommends global revertive mode, whereby the decision to revert to the newly restored link is entirely driven by the headend router.

The second situation is when the TE LSP is reoptimized by the headend along a more optimal path and the link is then restored. It is again the decision of the headend router to reoptimize any of its TE LSPs along this path. (Several considerations can be taken into account, such as the reoptimization frequency and the gain in terms of path optimality.)

Fast Reroute Summary

Fast Reroute has enjoyed great success thanks to its ability to provide SONET-like recovery times (provided that the link failure can be quickly detected, such as by means of PoS alarms, which is usually the case on backbone links). And thanks to its local protection nature, the Fast Reroute convergence time is highly optimized and deterministic. The rerouting node is immediately upstream of the failure (there is no fault notification time), and the backup path is signaled before the failure (backup paths are not computed on the fly). On the other hand, Fast Reroute requires the establishment of a potentially nonnegligible number of backup tunnels. You will see later in this book that several techniques and tools are available to facilitate the deployment of Fast Reroute and automate the creation of backup tunnels. In terms of complexity, as already pointed out, the backup tunnel path algorithm complexity and its management is a function of the set of requirements.

In summary, MPLS TE Fast Reroute is an efficient local protection mechanism that provides tens of milliseconds of convergence time. In addition, MPLS TE Fast Reroute can meet stringent recovery requirements such as bandwidth and propagation delay protection by means of more sophisticated backup tunnel path computation algorithms.