Core Network Availability
This section discusses how you can improve network availability by means of various network recovery mechanisms, which allow for the rerouting of the affected traffic along an alternate path.Network recovery mechanisms are available at various layers (optical, SONET/SDH, MPLS/IP, and so on). A detailed reference on this topic is [NET-RECOV]. However, this section focuses on the recovery techniques provided by IP routing and MPLS Traffic Engineering.When electing a particular network recovery mechanism (or sometimes a set of mechanisms), you should first determine the overall objectives in terms of network availability. Such objectives are usually driven by the application requirements. For instance, the requirement in terms of availability for ATM traffic carried over MPLS is undoubtedly significantly more stringent than for the Internet traffic. Moreover, even for a specific application such as VoIP, the requirements may vary from one service provider to another. Indeed, consider a network carrying VoIP traffic. If the objective is to ensure that a voice call will not be dropped in case of a network element failure, a convergence time of several seconds (usually about 2 seconds, although such a value greatly depends on the tolerance of the VoIP gateway's signaling protocol) is perfectly acceptable. On the other hand, if the service provider wants to offer VoIP service that can tolerate any network element failure without any noticeable degradation for the user, the recovery mechanism must be able to reroute the traffic within a few tens of milliseconds along a backup path offering stringent QoS guarantees. This highlights the fact that the objectives must first be clearly established so as to come up with the most appropriate network recovery design.After you have determined the network recovery objectives, you should evaluate the various available network recovery mechanisms while keeping in mind various criteria. The convergence time is obviously the first one that comes to mind. How much time is required to reroute the traffic upon a network element failure? This is clearly a critical aspect to consider, but certainly not the only one. The following are a few other important aspects:The scope of recovery Determines whether the recovery mechanism can handle link failure, Shared Risk Link Group (SRLG) failure, or node failure. As a reminder, the concept of SRLG relates to the notion of multiple links sharing a common resource whose failure would provoke all the links to fail simultaneously. For example, consider the case of multiple links routed across the same fiber. A fiber cut would imply the failure of all the links. We say that they share the same SRLG.QoS during failure Does the alternate path (usually called a backup path) provide an equivalent QoS?Network overhead This relates to the number of extra state required in the network by the recovery mechanism.Cost There is a very wide range of network recovery mechanisms whose costs vary by several orders of magnitude. For instance, an optical 1+1 protection mechanism, although very efficient in terms of rerouting time and QoS along the backup path, requires additional equipment for the traffic replication and some extra bandwidth dedicated to protection (actually as much bandwidth as the protected bandwidth). On the other hand, other recovery mechanisms such as IP are usually cheaper both in terms of extra equipment and network resources (of course, they may not provide the same level of performance).Network stability Does the network recovery under some circumstances (such as when very fast recovery is required) potentially lead to network instability when faced with frequent path changes?
This list is not meant to be exhaustive. It simply provides some of the most important aspects to evaluate when considering a network recovery mechanism. These different aspects are illustrated throughout this book in the various case studies.In some cases, it might be desirable to elect to use a combination of recovery mechanisms, although such an option requires some extra care to avoid race conditions between recovery mechanisms and double protection, which may lead to a lack of optimality in terms of backup resources. For instance, a common approach consists of combining SONET protection with fine-tuning Interior Gateway Protocol (IGP) rerouting.
Protection Versus Restoration
There are actually two families of network recovery mechanismsprotection and restoration. A protection mechanism relies on the precomputation (and signaling) of a backup (alternate) path before any failure. In contrast, a restoration mechanism requires the computation of the backup path only after the failure has occurred; in other words, the backup path is computed on the fly. In terms of convergence time, protection mechanisms are usually faster but also require more network overhead because extra state is required.
Local Versus Global Recovery
This other characteristic of a network recovery mechanism relates to the location of the node in charge of redirecting the traffic along a backup path. A recovery mechanism is said to be local when the node immediately upstream of the failure is responsible for rerouting the traffic should a failure occur. Conversely, with a global recovery mechanism, such as the default rerouting mode of MPLS TE, described in this section, the headend of the affected TE LSP is in charge of the rerouting.You will see various network recovery mechanisms: IP routing (restoration), MPLS TE headend reroute (global restoration), MPLS TE path protection (global protection), and MPLS TE Fast Reroute (local protection).Before describing the recovery mechanisms involved in IP routing and MPLS Traffic Engineering, it is worth introducing the notion of a recovery cycle. It can be used to describe the different steps involved in any network recovery procedure (see Figure 2-22).
Figure 2-22. The Recovery Cycle
[View full size image]

Network Recovery with IP Routing
Over the years, several routing protocols have been designed. Two major families of routing protocols exist: distance vector protocols (such as Routing Information Protocol [RIP] and EIGRP) and link-state routing protocols (such as Open Shortest Path First [OSPF] and Intermediate System-to-Intermediate System [IS-IS]).This section focuses on link-state protocols such as OSPF and IS-IS because they have been deployed in most service provider networks because of their superiority in terms of scalability, optimality, and convergence properties.Link-state routing protocols rely on the concept of a Link-State Database (LSDB)a collection of Protocol Data Units (PDUs) called Link-State Advertisements in OSPF (see [OSPFv2]) and Link-State Packets in IS-IS (see [ISIS])that describes some part of the overall network topology and IP address reachability. (This section uses the generic term LSA for both OSPF and IS-IS.) Each router is responsible for originating one or more LSAs (depending on whether we refer to IS-IS or OSPF) and the collection of all the LSAs originated by each router constitutes the LSDB. In contrast to the distance vector protocols, each router running a link-state routing protocol has a complete view of the network topology through its LSDB. Then an algorithm known as the Dijkstra algorithm allows for the computation of the Shortest Path Tree (SPT), according to a specific metric, from the computing router (the tree root) to every reachable node in the network. Finally, based on this SPT, each node builds its routing table, which contains the shortest path to each reachable IP prefix in the network.A crucial aspect of link-state routing is of course to guarantee the synchronization of all the routers' LSDBs within a routing domain. This is of the utmost importance to avoid routing loops (because routers with different views of the network topology may make inconsistent routing decisions, leading to a routing loop). Such a lack of synchronization between LSDBs can occur during transient states for a temporary period, as illustrated later in this section.Although OSPF and IS-IS differ in many respects, they are quite similar as far as the recovery aspects are concerned. Thus, this section applies equally to OSPF and IS-IS.Let's start with the situation during steady state. As shown in Figure 2-23, in steady state, all the routers have an identical LSDB. The Hello protocol is used between neighboring routers to check that each neighbor is "alive" by means of short messages exchanged at regular intervals (we usually say that a router maintains routing adjacencies).
Figure 2-23. Link-State Routing Protocols
[View full size image]

Figure 2-24. Mode of Operation of IP Routing Upon Failure
[View full size image]

Use of Dynamic Timers for LSA Origination and SPF Triggering
As previously mentioned, dynamic timers are used to control both the origination of an LSA and the triggering of SPF computation. Any recovery mechanism has the goal of trying to provide fast restoration of a network element affected by a failure that requires fast reaction to such failure events. In the example of IP routing, you saw that a router detecting a loss of a routing adjacency provoked by a link or neighbor node failure originates a new LSA reflecting the network topology change. Such an LSA is flooded throughout the network, and every router receiving this new LSA consequently triggers a new routing table calculation. To decrease the overall convergence time, it is desirable for every router connected to the failed resource to quickly originate a new LSA and for every router receiving such a new LSA to quickly trigger an SPF computation. But this requires some caution so as to protect the network from unstable network elements. Consider the case of a "flapping" link. If a new LSA is quickly originated at each link state change, this would unavoidably result in frequent IGP LSA updates and SPF triggering on every router in the network, potentially leading to network instability. Thus, the solution to this problem is to use a dynamic timer to quickly react to simple failures but also dampen the LSA origination and SPF triggering if frequent network state changes occur in the network. The algorithm used on a Cisco router to dynamically compute such a timer is based on exponential back-off with three parameters used by the router. (Example 2-1 is given for the IS-IS LSP origination but applies equally to the SPF triggering.)
Example 2-1. Exponential Back-Off
Parameter B in Example 2-1 specifies in milliseconds how long the router detecting the loss of adjacency for the first time waits before originating a new LSP. If a second state change occurs, the router waits for C milliseconds. If a third state change happens, the router waits for 2 * C, and then 4 * C, and so on up to a maximum of A seconds. At this stage, the delay between the origination of two LSPs is A seconds if the link keeps flapping. Then if the link stays in a stable state for 2 * A seconds, the router returns to the original behavior. This is illustrated in Figure 2-25.
!
router isis
lsp-gen A B C
Figure 2-25. Exponential Back-Off Algorithm for LSA Origination
[View full size image]

Computing the Convergence Time with IP Routing
You have seen the different steps that occur during the IP recovery process. As soon as the failure is detected, each neighboring router of the failed resource originates a new LSA after a timer has elapsed. Each LSA is reliably flooded throughout the network. Finally, each node receiving a new LSA (reflecting a topology change) triggers an SPF computation after another timer has also elapsed. Consequently, the total convergence time depends on quite a long list of factors: the routing timer settings, the network topology and number of prefixes, and so on. Several examples are provided in various case studies, but we'll give you an order of magnitude, with some careful design rules. Rerouting times on the order of 1 second can be achieved in very large networks but require some nonnegligible engineering work. Also, you should keep in mind two important aspects inherent in IP routing:Lack of predictability All the routers of course eventually converge, but the exact event timing is hard to predict.Transient routing loops Because the flooding of a new LSA takes some time, at some point the various routers in the network may have unsynchronized LSDBs. Hence, having different views of the network topology, they may make routing decisions leading to loops. That being said, these loops are temporary and are cleared as soon as all the routers converge.
As explained earlier, it may be desirable to wait for a certain period of time before flooding an LSA or computing an SPF. For instance, when flooding a new LSA, it may be desirable to wait for some time to expire in case another lower-layer recovery mechanism can restore the failed resource. When computing an SPF, in case of a node failure, several LSAs are originated. Hence, waiting before computing the SPF increases your chances of getting an accurate LSDB before computing a new routing table.To protect the network from instability caused by a flapping network resource, a dynamic timer is desirable. The case of an unstable link is a good example. Without a dynamic LSA origination timer, both R4 and R5 would constantly originate new LSAs that would in turn generate some potentially nonnegligible routing control updates and would also trigger new routing table computations on each nodewhich, of course, is highly undesirable. Hence, a back-off mechanism has been designed to quickly react (originate the new LSA) when a link first fails and then slow down the LSA origination if the link flaps. The algorithm (available on Cisco routers) used by the back-off mechanism has three parameters: T1, T2, and T3. T1 specifies how long a router that has detected a link failure (more precisely, a loss of routing adjacency) waits before originating a new LSA. If a second state change occurs, the router waits for T2 before originating a new LSA. If the link keeps flapping, the period between successive LSA originations doubles at each change, up to a maximum value of T3. If the link remains in a stable state for 2 * T3, the router reverts to the original behavior. Such an algorithm allows for fast reaction upon single failure while protecting the network in case of unstable resources. A similar algorithm can be used for the SPF calculation, which also provides an efficient mechanism for fast convergence while protecting the router from some misbehaving router(s) or some major network instability conditions.A common misperception is that IGPs converge in tens of seconds. This section has shown that in reality this can be reduced to 1 to 2 seconds with appropriate tuning. Furthermore, IP routing inherently provides backup bandwidth sharing. Indeed, no resource is reserved beforehand should a resource fail. Hence, the available bandwidth can be used to reroute any traffic upon failure. On the other hand, subsecond rerouting time is much more difficult to achieve. Other network recovery mechanisms are probably more suitable for such requirements. Moreover, guaranteeing equivalent QoS in case of network failure is also quite challenging.
Network Recovery with MPLS Traffic Engineering
MPLS Traffic Engineering provides a full spectrum of network recovery mechanisms. We will first review the default recovery mode (called MPLS TE reroute) based on global restoration, and then path protection (global protection), and finally MPLS Fast Reroute (local protection). Each mechanism differs in its ability to meet various recovery requirements such as rerouting times, scope of recovery, ability to provide equivalent QoS during failure, required amount of extra state, and so on. Depending on its requirements, a service provider then can elect the appropriate MPLS TE recovery mechanism.
MPLS TE Reroute
MPLS TE reroute, the default mode of network recovery of MPLS Traffic Engineering, is a global restoration mechanism:Global The node in charge of rerouting a TE LSP affected by a network element failure is the headend router.Restoration When the headend router is notified of the failure, a new path is dynamically computed, and the TE LSP is signaled along the new alternate path (assuming one can be found). For the sake of exhaustiveness, it is also possible to precompute or preconfigure an alternate path. Be aware that before any failure occurs, you should determine a path that is fully diverse from the active one, because you won't know about a failure beforehand.
Consider the example shown in Figure 2-26. A TE LSP T1 is initially set up along the path R1R2R3R4R5. The link R3R4 fails. After a period of time (the fault detection time), the router R3 (and the router R4) detects the failure. Again, this period of time essentially depends on the failure type and the Layer 1 or 2 protocol. If you assume a Packet over SONET (PoS) interface, the fault failure detection time is usually on the order of a few milliseconds. In the absence of a hold-off timer, the router upstream of the failure immediately sends the failure notification (RSVP-TE Path Error message) to the headend router (R1 in this example).
Figure 2-26. MPLS Traffic Engineering Reroute (Global Restoration)
[View full size image]

MPLS TE Path Protection
Another network recovery mechanism available with MPLS TE is path protection. The principle is to precompute and presignal a TE LSP used as a backup in case the primary TE LSP is affected by a network element failure. The backup LSP path can be dynamically computed by the headend (by CSPF) or by means of an offline tool.Consider the network shown in Figure 2-27. The backup LSP has to be diverse (which means it should not use any of the same facilities, such as links, as the protected TE LSP) because the fault location by definition is unknown beforehand. Multiple schemes offer various degrees of diversity and thus protect against different scopes of failure. Figure 2-27 shows an example of a backup path offering link diversity and a backup path offering node diversity.
Figure 2-27. MPLS Traffic Engineering Path (Global Protection)
[View full size image]

MPLS TE Fast Reroute
MPLS TE Fast Reroute, a local protection mechanism, is by far the most widely deployed MPLS TE recovery mechanism. It relies on the presignaling of backup tunnels at each node, which are used to locally reroute all the TE LSPs affected by a network failure. To protect a facility such as a link, SRLG, or node, the set of relevant routers must be configured. The set of required backup tunnels may be configured manually (in which case their paths are statically configured) or by means of automatic mechanisms (details of such mechanisms are seen in several case studies).Consider the network shown in Figure 2-28. At each hop, a backup tunnel is configured that follows a diverse path from the protected facility. (In this case, the facility is links, but you will see that Fast Reroute can also be used to protect against SRLG and node failure.) Figure 2-28 illustrates the use of Fast Reroute to protect the link R2R3.
Figure 2-28. MPLS Traffic Engineering Fast Reroute (Link Protection)
[View full size image]

Mode of Operation Before Failure
We will now review the steps required to use MPLS TE Fast Reroute before any failure occurs.As shown in Figure 2-28, a backup tunnel B1 from R2 to R3 is signaled before any failure. It protects all the TE LSPs that cross the link R2R3 (such as T1 and T2). At this point, it is worth mentioning that the eligibility of a TE LSP to benefit from Fast Reroute along its path can be configured on a per-TE LSP basis and can be explicitly signaled in RSVP-TE. (Indeed, it may be desirable to protect only a selected subset of TE LSPs by means of Fast Reroute, based on various availability requirements.)In this example, two TE LSPs traverse the link R2R3. We will focus on the TE LSP T1. T1 is signaled along the path R1R2R3R4R5. (The corresponding labels distributed by RSVP-TE [Resv messages] are shown in Figure 2-28.) When the T1 LSP is first signaled, each LSR along the path determines whether the TE LSP is asking to be protected by Fast Reroute. If T1 is signaled with such a property, every router tries to select a backup tunnel to protect the TE LSP should a link fail. In the case of T1, R2 (also called a Point of Local Repair [PLR]) selects B1 as the backup tunnel to be used in case of failure of the link R2R3. Each PLR selects a backup tunnel that meets the requirement to intersect the protected TE LSP on some downstream node (called the Merge Point [MP])R3 in this example. RSVP-TE signaling lets you specify more parameters related to Fast Reroute, such as any requirement for bandwidth protection (in other words, whether a backup tunnel offering an equivalent QoS is required). We will revisit this aspect later.
Mode of Operation During and After Failure
Upon a link failure (the link R2R3 in Reoptimization of a Traffic Engineering LSP."As soon as local rerouting occurs, the PLR must refresh the set of rerouted TE LSP onto the backup tunnel. Indeed, RSVP-TE is a soft-state protocol; hence, the state of each TE LSP must be regularly refreshed. You already saw that refreshes are performed by sending RSVP-TE Path and Resv messages (downstream and upstream, respectively). This also applies to a rerouted TE LSP. Consequently, for each TE LSP rerouted onto a backup tunnel, the PLR must keep refreshing the TE LSP state by sending Path messages to the downstream neighbor onto the backup tunnel. (Note that those RSVP-TE Path messages are label-switched from the PLR to the MP, so they are not seen by any intermediate node along the backup path. This explains why the rerouted TE LSPs do not create any additional state along the backup path.) It is also important to mention that in some cases the headend router may not be able to reroute the affected TE LSP. In this case, the rerouted TE LSP stays in this mode until a reoptimization can occur. This highlights the need to refresh the state of such locally rerouted TE LSPs. Several additional details are related to the signaling procedures when Fast Reroute is triggered; they are described in [FRR].But why should the state be refreshed downstream if its respective headend router will eventually quickly reoptimize the TE LSPs? There are two reasons. First, depending on the RSVP-TE timer setting and the event sequence timing, the state for the rerouted TE LSP may time out before the headend router has had time to effectively reoptimize the affected TE LSP end to end. Another reason might be the impossibility for the headend router to find an alternate path obeying the set of required constraints. In such a case, an implementation should maintain the rerouted TE LSP in its current state (along the back tunnel) to avoid TE LSP failure, until a more optimal path can be found.
The situation as soon as Fast Reroute has been triggered (also called during failure) is shown in Figure 2-29.
Figure 2-29. Mode of Operation for MPLS Traffic Engineering (Link Protection)
[View full size image]

Figure 2-30. MPLS Traffic Engineering Fast Reroute (Node Protection)
[View full size image]

Figure 2-31. Mode of Operation for MPLS Traffic Engineering Fast Reroute Node Protection
[View full size image]

Number of NNHOP Backup Tunnels Required by Fast Reroute Backup
The minimum number of backup tunnels required on a given PLR to protect a node equals the number of next-next hops. Indeed, going back to our example, three backup tunnels are required on R2 to protect all the TE LSPs against a node failure of R3:An NNHOP backup tunnel starting on R2 and terminating on R4 to protect the TE LSPs that follow the path R2R3R4An NNHOP backup tunnel from R2 to R7 to protect the TE LSP following the path R2R3R7An NNHOP backup tunnel from R2 to R10 to protect the TE LSPs following the path R2R3R10
We mentioned earlier that MPLS TE Fast Reroute can also protect against SRLG failure. MPLS TE Fast Reroute can be used to protect TE LSPs from SRLG failure by simply taking into account the SRLG membership (flooded by the IGP) of each link when computing the backup tunnel's path.
Backup Tunnel Path Computation
A multitude of path computation algorithms can compute the backup tunnel paths that can either be distributed or centralized. As already pointed out, the set of objectives largely dictates the algorithm complexity.For example, the requirement can be to just find a path disjoint from the facility (such as link, SRLG, or node) to protect. In this case, the algorithm is quite straightforward. On the other hand, if additional constraints, such as bandwidth protection, bounded propagation delay increase, and so on, must also be satisfied, this leads to a usually nonlinear increase in algorithm complexity. In this category, algorithm efficiency is usually measured in terms of required backup capacity, among other criteria.Indeed, one of the objectives of any recovery mechanism is to minimize the required number of resources dedicated to backup. If some guarantees are required along the backup path, in terms of bandwidth, delays, and so on, this implies the reservation by the backup tunnel of network resources such as bandwidth. With respect to such guarantees, a relatively common objective is to protect against single network element failure. (Note that in the case of an SRLG, such a network element can itself be composed of multiple links. This might be the case when several optical lambdas are multiplexed over a single fiber by means of technology such as Dense Wavelength Division Multiplexing [DWDM] and thus are all part of the same SRLG.) When such an assumption is considered acceptable (usually referred to as the single failure assumption), you can make an interesting observation. If two backup tunnels protect independent resources, because it is assumed that those two resources will not fail simultaneously, this also means that the two backup tunnels will never be active at the same time. Hence, the required amount of backup bandwidth is not the sum of their bandwidth on every link they share, but simply the largest of their respective bandwidth. Support for this concept of bandwidth sharing by the backup path computation algorithm allows for a significant reduction in the amount of required backup bandwidth under the single-failure assumption.
Backup Tunnel Load Balancing
This refers to the ability to presignal more than one backup tunnel between a PLR and an MP when protecting a single network element (that is, a set of four next-hop backup tunnels to protect a single link). Why? There could be several reasons for such a design, but the most common one is the inability to find a path for a single backup tunnel that satisfies the necessary constraints. For instance, consider the case of an OC-192 link for which the operator requires full bandwidth protection. In other words, a backup tunnel must be computed that provides an equivalent QoS (which can usually be reduced to the constraint of finding a path offering equivalent bandwidth). It just might not be possible to find such a path.One solution would be to signal multiple backup tunnels (for instance, four 2.5-Gbps backup TE tunnels) such that the sum of their bandwidth equals the bandwidth of the protected facility (for example, an OC-192 link). Hence, a PLR has several backup tunnels to protect a given facility. As soon as a TE LSP requesting protection is signaled, the PLR selects one backup tunnel from among the set of candidates (the set of backup tunnels that satisfy the TE LSP requirements). It is important to highlight that for each TE LSP, a single backup tunnel is selected (this is why the term load balancing requires some clarification). You don't use multiple backup tunnels to protect a single TE LSP upon failure to avoid packet reordering. The packets of a single flow would follow different paths. This could lead to packet reordering, especially if the backup tunnel's paths have different characteristics, such as bandwidth and/or propagation delay. Sophisticated algorithms are needed on the PLR to perform an efficient backup tunnel selection to tackle the usual well-known challenges. One challenge is packing problems (for a set of TE LSPs, how to choose a set of backup tunnels to satisfy the maximum number of requests). Another challenge is smart mapping (for example, map different sets of primary TE LSPs with different attributes onto different backup tunnels with the corresponding property).This might be seen as a simple and elegant solution, but it comes at the cost of some additional overall complexity and requires configuration of more backup tunnels.
Revertive Versus Nonrevertive
When is a newly restored link reused? There are actually two situations to consider.First is the case of TE LSPs locally rerouted by Fast Reroute. If the link is restored before its headend router reoptimizes the TE LSP, you could envision letting the PLR revert the TE LSP to the original path. However, such an approach (also called local reversion) has several drawbacks. But in case of a flapping link, it would result in constantly switching the traffic from one path to another, which would lead to recurring traffic disruptions if no dampening algorithm were used. Hence, the Fast Reroute specification (see [FRR]) recommends global revertive mode, whereby the decision to revert to the newly restored link is entirely driven by the headend router.The second situation is when the TE LSP is reoptimized by the headend along a more optimal path and the link is then restored. It is again the decision of the headend router to reoptimize any of its TE LSPs along this path. (Several considerations can be taken into account, such as the reoptimization frequency and the gain in terms of path optimality.)
Fast Reroute Summary
Fast Reroute has enjoyed great success thanks to its ability to provide SONET-like recovery times (provided that the link failure can be quickly detected, such as by means of PoS alarms, which is usually the case on backbone links). And thanks to its local protection nature, the Fast Reroute convergence time is highly optimized and deterministic. The rerouting node is immediately upstream of the failure (there is no fault notification time), and the backup path is signaled before the failure (backup paths are not computed on the fly). On the other hand, Fast Reroute requires the establishment of a potentially nonnegligible number of backup tunnels. You will see later in this book that several techniques and tools are available to facilitate the deployment of Fast Reroute and automate the creation of backup tunnels. In terms of complexity, as already pointed out, the backup tunnel path algorithm complexity and its management is a function of the set of requirements.In summary, MPLS TE Fast Reroute is an efficient local protection mechanism that provides tens of milliseconds of convergence time. In addition, MPLS TE Fast Reroute can meet stringent recovery requirements such as bandwidth and propagation delay protection by means of more sophisticated backup tunnel path computation algorithms.