Network Recovery Design
The requirements in terms of network availability significantly differ between the PSTN traffic and the rest of the traffic. A convergence time of a few seconds is perfectly tolerable and in line with the SLAs for the Internet and Layer 3 MPLS VPN traffic. However, the objective is to provide a convergence time of a few tens of milliseconds to the PSTN traffic in case of a single link, SRLG, or node failure (similar to the availability provided with TK's former SDH infrastructure). Note that the PSTN traffic must also be rerouted within a few tens of milliseconds in case of a node failure in the MPC network. This was not possible with TK's previous PSTN network. Indeed, the links were protected with SDH. But in the case of Class 4 voice switch failure, all the voice calls were dropped, and the communication had to be reestablished. There was no possibility for a voice call to survive a node failure. That said, note that a Class 4 voice switch failure was extremely rare.
Network Recovery Design for the Internet and Layer 3 MPLS VPN Traffic
With an objective of a few seconds for the Internet and Layer 3 MPLS VPN traffic in case of failure, aggressive OSPF timer tuning clearly was not required. Thus, TK decided to choose conservative OSPF protocol tuning.
Failure Detection Time
By default, OSPF is configured with a 10-second hello interval and a 40-second RouterDeadTimer on most of the commercial router platforms. Because both the NAS and BAS devices are connected by means of Layer 2 switches, the default configuration does not meet the requirements of a few seconds in case of failure. The OSPF hello protocol must be used for failure detection; there is no lower-layer fast failure detection mechanism, as in the case of SDH and DWDM links.NoteThe case of point-to-point Gigabit Ethernet interfaces without intervening Layer 2 switches is quite different. Upon fiber cut, a loss of signal (LoS) is quickly detected, making tuning the OSPF hello interval unnecessary. But in the case of TK, Layer 2 switches are used to reduce the number of required ports. Consequently, the failure of a link or port would not be detected by equipment connected behind the Layer 2 switch.The hello frequency has been set to 1 second with a RouterDeadTimer of 3 seconds. This effectively means that in worst-case failure scenarios the failure is detected within 3 seconds. The configuration template for these changes is shown in Example 4-8.
Example 4-8. OSPF Timer Configuration Template
On the other hand, on SDH and DWDM links (which represent the vast majority of the links in the MPC network), network failures are detected within a few milliseconds.
interface pos3/0
ip ospf hello-interval 1
ip ospf dead-interval 3
!
LSA Generation
As soon as the failure has been detected and reported to the OSPF process, the first step is to originate a new LSA to inform the other routers of the topology change. As mentioned in Chapter 2, the challenge is to quickly originate a new LSA so as to improve the IGP converge time while preserving the network stability in case of unstable network resources (such as a link flap). To that end, modern routers provide dynamic mechanisms such as the exponential back-off algorithm described in Chapter 2. TK elected to use the configuration shown in Example 4-9.
Example 4-9. OSPF LSA Origination Configuration
Chapter 2.
router ospf 1
timers throttle lsa all 0 40 5000
Failure Notification Time
For the traffic to be rerouted along an alternate path if a failure occurs, the LSA originated by the node that detects the failure must first be received by the rerouting router, which might be several hops away from the failure. Thus, this period (usually called the failure notification time) is the sum of the propagation, queuing, and processing delays along the path between those two nodes. Note that the processing delay may be optimized by means of various mechanisms on some router platforms, but this component of the failure notification time is considered sufficiently small not to require any further tuning.TK conducted some studies that showed that the failure notification time in worst-case conditions in its network (considering the high degree of meshing and low propagation delays) rarely exceeded 100 ms. This is negligible considering the overall goal of a few seconds of total convergence time.
SPF Triggering
Similar to the LSA origination case, on a Cisco router an exponential back-off mechanism can be used for the SPF triggering. TK chose the configuration shown in Example 4-10.
Example 4-10. Exponential Back-Off Configuration
The variables shown in Chapter 2.
router ospf 1
timers throttle spf 50 50 10000
NoteOn modern routers the SPF complexity is usually close to n * log(n), where n is the number of routers in the network. Algorithm complexity characterizes the SPF duration time. TK measured the SPF duration in its network and found that it was always less than 40 ms. Thus, using SPF computation optimization such as incremental SPF was not required.
RIB and FIB Updates
The RIB and FIB update times are, of course, highly hardware-dependent, but TK measured that those times were systematically less than 0.5 seconds in its network on any router platform.
OSPF Design Conclusions
TK's OSPF design clearly allows for rerouting times on the order of a few seconds. This is in line with TK's objective for the Internet and Layer 3 MPLS VPN traffic in case of failure. It is also worth mentioning that in case of failure of the inter-POP links (SDH and DWDM), significantly faster rerouting times can be achieved (about 1 second) thanks to the ability to quickly detect the failure. The worst-case scenario is a failure within a POP caused by the requirement of relying on OSPF to detect the failure (which is 3 seconds in the case of the elected design).In case of link, SRLG, or node failure, congestion may occur. The congestion is handled by the DiffServ mechanisms that are in place to protect traffic according to its respective importance thanks to appropriate queuing. That said, based on capacity planning, the OSPF metrics have been computed to limit the likelihood of degraded service that would impact the traffic SLA should a single failure occur in the MPC network.
Network Recovery Design for the PSTN Traffic
Because the PSTN traffic must be rerouted within a few tens of milliseconds in case of link, SRLG, or node failure, and because such traffic is routed to TE LSPs, the most appropriate network recovery mechanism is undoubtedly MPLS TE Fast Reroute.
Failure Detection
A key aspect to consider when choosing a network recovery strategy is the network element failure detection time. It might represent a nonnegligible part of the overall rerouting time. In the case of the MPC network, TK decided to exclusively use SDH and DWDM alarms reported in case of link failure by its SDH and DWDM equipment. Because of this, a link failure is usually detected within a few milliseconds.It is worth elaborating on the case of a router failure, because the P routers of the MPC network are all based on a distributed architecture. This has the advantage that in case of a control-plane failure, the traffic does not suffer from any traffic disruption, so the failure detection time does not matter as much. In the case of a control-plane failure, it is sufficient to rely on the expiration of the RouterDeadTimer (3 seconds) and subsequent failure of the routing adjacency to trigger a reroute for the IP traffic because traffic is not affected in the meantime. Note that for the PSTN traffic routed to TE LSPs, as soon as their respective headend router is informed of the control-plane failure, the TE LSPs are rerouted along another path, avoiding the failed router. Similar to the previous case, the traffic remains unaffected by the control-plane failure. Note that this does not require any specific mechanism and should be considered the default behavior of a distributed architecture platform. That said, note that this assumes that the route processor does not reboot, for example. In that case, upon reloading its software, the route processor may update its line card's control-plane processor, which may lead to traffic disruption. The failure case considered here is a simple route processor failure, such as a hardware failure.NoteThe case of PE-PSTN node failure is studied later in this section.The case of a power supply failure results in the failure of all the router-attached links. Similarly, a line card failure provokes the failure of all its links. Consequently, such failures are equivalent to link failures in terms of triggering network rerouting.
Set of Backup Tunnels
Two types of backup tunnels must be provisioned in the MPC network. The first type is next-hop (NHOP) backup tunnels, which protect the PSTN traffic from the failure of a link or SRLG. The second type is next-next-hop (NNHOP) backup tunnels, which protect the PSTN traffic from a node failure (such as a hardware node failure that affects both the control and forwarding planes).
Backup Tunnel Constraints
The first constraint a backup tunnel path must meet is to be diversely routed from the protected facility.In the case of an NHOP backup tunnel, the backup tunnel path must be diverse from the link under protection. If the link belongs to an SRLG, the backup tunnel must be diversely routed from the SRLG the protected link belongs to. In other words, the backup tunnel must not traverse any link that shares one or more SRLGs with the protected link. An SRLG failure would provoke the failure of both the protected link and the backup tunnel used to protect that link.A more optimal solution would be to have a backup tunnel protect the link and another backup tunnel protect the SRLG. Thus, in case of failure, the Point of Local Repair (PLR) would select the appropriate backup tunnel. Furthermore, the same concept could be applied to the case of overlapping SRLGs. Instead of having one backup tunnel SRLG that is diverse from all the SRLGs the protected link belongs to, you could have one backup tunnel per SRLG. Unfortunately, this is not a viable option because a router acting as a PLR cannot differentiate a link from an SRLG failure. Hence, when a link belongs to an SRLG, the NHOP backup tunnel must systematically be SRLG-diverse. This important concept requires some additional explanation. Consider Figure 4-32, which shows the set of SRLGs in the MPC network.
Figure 4-32. Telecom Kingland SRLG Membership
[View full size image]

Backup Tunnel Design Between Level 1 POPs
One of the objectives of the TK design is to protect any TE LSP from link, SRLG, or node failure. To achieve this aim, one NHOP SRLG-diverse backup tunnel is required per protected link, and one NNHOP SRLG-diverse backup tunnel per next-next hop. To illustrate this Fast Reroute design, you should consider the example of the cw2c1 link attached to the router in the Center-West POP.Protecting the cw2c1 link requires that an NHOP backup tunnel path be computed that is SRLG-diverse from the link. You can do this by manually considering each link in the network, the SRLG membership, and so on, explicitly configuring the backup tunnel path. This also can be done automatically by each router.TK opted for an automatic computation and configuration of the backup tunnels. To that end, each router in charge of computing a backup tunnel path for each of its neighbors must be aware of the SRLG memberships of all the links (such as the fact that the links cw2c1 and c1s1 belong to the same SRLG). The Internet Engineering Task Force (IETF) has specified some IGP extensions to flood the SRLG membership. In the case of OSPF, [OSPF-GMPLS] defines several new sub-TLVs carried in the link TLV (Type 2) that provide additional link characteristics. One of them is the SRLG membership (sub-TLV 16). On a Cisco router, the SRLG membership of a given link is configured only once on that link, as indicated in Example 4-11. Then it is automatically flooded by means of OSPF to the other routers in the same OSPF area (because the opaque LSA used for MPLS Traffic Engineering extensions has a Type 10).
Example 4-11. SRLG Membership Configuration
Following the configuration of Example 4-11, on each link, the set of SRLGs the link belongs to is configured. Then the SRLG membership is passed to OSPF and flooded throughout the area (TE LSAopaque LSA Type 10). Figure 4-33 shows the OSPF SRLG sub-TLV format.
interface POS3/0
mpls traffic-engineering srlg 1
Figure 4-33. OSPF SRLG Sub-TLV Format

This does the following:Automatically configures NHOP and NNHOP backup tunnel(s)Ensures that the backup tunnels are SRLG-diverse when possible
mpls traffic-engineering auto-tunnel backup srlg exclude preferred
This command triggers the following set of actions:By examining its OSPF topology database, each router first determines its set of links where a routing adjacency is established. For each link, an SRLG-diverse NHOP backup tunnel path is computed and presignaled, provided that at least a primary TE LSP traverses the protected link (if no TE LSP exists, there is no need to instantiate a backup tunnel).The router then determines its set of next-next hops and configures for each of them an SRLG-diverse NNHOP backup tunnel, provided that at least a primary TE LSP follows this protected section.
Figure 4-34 provides examples of NHOP and NNHOP backup tunnels.
Figure 4-34. Example of NHOP and NNHOP Backup Tunnels
[View full size image]

NoteAlthough both NHOP and NNHOP backup tunnels are configured, the only TE LSPs that are rerouted onto an NHOP backup tunnel in case of a link/SRLG failure are the TE LSPs that terminate on the next hop. Other TE LSPs are systematically rerouted onto their NNHOP backup tunnel. This is because a PLR cannot differentiate a link failure from a node failure. Consider the case of the failure of the link cw2c1 or a node power failure of c1. Both of these result in a failure of the link cw2c1. Thus, in the case of the failure of the link cw2c1, the PLR router cw2 has to assume a node failure to be on the safe side. If it turns out that the problem was a link failure, the TE LSPs potentially might have to be rerouted onto a longer backup tunnel path. (This is because the path of an NNHOP backup tunnel is usually longer than the path the traffic would have followed if rerouted via the NHOP backup tunnel.) This is preferable to rerouting onto the NHOP backup tunnel if the failure was in fact a node failure. Some schemes (based on probing) have been proposed to distinguish a link failure from a node failure. The idea is to send a probe message to the NHOP backup tunnel right after the occurrence of a link failure. This allows the PLR to determine whether the next-hop neighbor is alive. If a response is received, the failure is just a link failure; otherwise, the failure is a node failure. Given this, the designer has two choices:Make the assumption of a link failure and switch back to the NNHOP backup tunnel if the PLR determines that the node is a node failure.Make the assumption of a node failure and switch back to the NHOP backup tunnel (potentially offering a more optimal backup path) if the failure is characterized as a link failure.
In the first mode, the rerouted TE LSPs follow a more optimal path. In case of a node failure, the traffic disruption is significantly longer because it requires some time for the PLR to determine that the failure was in fact a node failure. In the second mode, the path in case of link failure is potentially slightly longer, but the rerouting time is always minimized. The drawback of a potentially less-optimal backup path for a limited period of time (until the rerouted TE LSP is reoptimized along a more optimal path) is limited compared to the advantage of always minimizing the traffic disruption. Therefore, most of the current implementations have elected the second mode without any mechanism to switch back to the NHOP backup tunnel if the failure is a link failure. Indeed, it would take some time for the PLR to characterize the failure. The time during which the rerouted TE LSPs would be rerouted onto the NHOP backup tunnel would then become very limited.The ability to keep track of the SRLG membership is of the utmost importance. Therefore, TK maintains a database of the MPC links' SRLG memberships that is populated by the team in charge of the network infrastructure. For example, such an SRLG membership could occur because the team in charge of the transport network decided to reroute some optical light paths along another route. Consequently, this may lead to changes in terms of SRLG membership.Each time an SRLG membership is modified in the database, an alarm is triggered. This tells the team in charge of the MPC IP/MPLS network to reconfigure the SRLG membership accordingly on the relevant links. Note that on a Cisco router, a change in SRLG membership configuration is automatically detected by all the routers in the network. They all trigger the recomputation of their set of backup tunnels to ensure the backup tunnel path SRLG diversity.Note that such SRLG membership has no impact on the primary TE LSPs. It potentially impacts only the backup tunnels.
Relaxing the SRLG Diversity Constraint
In some situations the constraint of computing an SRLG-diverse backup tunnel path might keep the PLR from finding a solution. Indeed, in some networks where overlapping SRLGs are very common, there might be some regions of the network where an SRLG-diverse backup tunnel could not be found and still a backup tunnel could be useful to protect against interface and router failures. This is not the case with the MPC network. In steady state, an SRLG backup tunnel can always be found. That said, the inability to find an SRLG-diverse backup tunnel could still occur in case of multiple failures. (Note that in case of multiple failures, the QoS objectives may no longer be reached, but at least it could be useful to still be able to have a backup tunnel after the first failure has occurred if a second failure occurs.) Consider the case of a first failure of SRLG2, as shown in Figure 4-35.
Figure 4-35. Relaxation of the SRLG Diversity Constraint
[View full size image]

Design of the Backup Tunnels Between Level 2 and Level 1 POPs
Because exactly two links connect a Level 2 POP to a Level 1 POP, TK ensured that they do not share any SRLG with any other link. (Otherwise, a single SRLG failure would result in isolating a POP, which is unacceptable.) The same design as for the Level 1 POP case applies here, with the additional simplification of not having to deal with any SRLG. Each router is configured to automatically compute the required set of NHOP and NNHOP backup tunnels. The only constraint is diversity from the protected section (link or node). This is shown in Figure 4-36. NHOP back tunnel B1 protects against failure of link x2c1, and NNHOP backup tunnel B2 protects against failure of c1 for tunnels transiting c1 and s1.
Figure 4-36. Backup Tunnel Design Between Level 2 and Level 1 POPs
[View full size image]

Period of Time During Which Backup Tunnels Are in Use
MPLS TE Fast Reroute (FRR) is a temporary mechanism. The protected TE LSPs are locally rerouted by the node immediately upstream of the failure (the PLR) until they get rerouted along a potentially more optimal path by their headend router. In the case of the MPC network, TK conducted some analysis to approximate the period during which a TE LSP would be rerouted to its backup tunnel in case of a network element failure. This was particularly important because the decision had been made to use zero-bandwidth backup tunnels.Consider Figure 4-37. Upon a failure of the link cw2c1, a primary TE LSP T1 (between PE-PSTN2-1 and PE-PSTN2-4) would be locally rerouted to the NNHOP backup tunnel B3 within a few tens of milliseconds. Extensive lab testing established that such local fast rerouting would take 60 ms in the worst case. This includes the time to detect the failure and effectively reroute all the TE LSPs to their respective backup tunnels. Then an RSVP Path Error message is sent to the headend router PE-PSTN2-1 to notify it of the local reroute. Such an RSVP message must be processed by each intermediate hop before it is forwarded toward the headend. The receipt of such a notification by the headend router immediately triggers the computation of a new path for T1. The path computation in the case of the MPC network was less than 2 ms. Because a headend router can potentially have multiple affected TE LSPs, to compute the worst-case reoptimization time, the CSPF duration must be multiplied by the maximum number of affected TE LSPs (less than 62, which is the maximum number of TE LSPs per headend router).
Figure 4-37. Estimation of the Time During Which a Backup Tunnel Is Active
Quality of Service Design." Hence, the queuing delay along each hop is negligible.Finally, the processing delay at each hop was estimated to be at most 10 ms. Consequently, because the maximum number of visited hops is ten, the round-trip signaling time is always less than [10 * 10 (processing delay) + 15 ms (propagation delay)] * 2 (in each direction), which equals 230 ms. Consequently, the maximum amount of time necessary to reroute a TE LSP at the headend router after a failure is 469 ms:
(10 * 10 ms) + 15 ms (time for the headend to receive the RSVP Path Error message notifying it of the failure) + 62 * 2 ms (time to recompute the new path) + 230 ms (round-trip signaling time) = 469 ms
This means that in the very worst-case scenario, upon a link, SRLG, or node failure, a TE LSP would be rerouted to its backup tunnel within 60 ms. It would use the backup tunnel for a period of 469 ms before being reoptimized by its headend router. Note that in reality this time typically is significantly shorter for most TE LSPs. (The preceding computation uses the very worst case of a TE LSP following a ten-hop path, a very remote failure, and the improbable case of a headend router having to reroute all its TE LSPs affected by the failure.)
Configuration of a Hold-Off Timer
The need for a hold-off timer was explained in Chapter 2. However, as a reminder, when network recovery schemes are available at multiple layers (optical, SONET-SDH, IP/MPLS), it is desirable to introduce some delays at each layer before triggering the recovery to give a lower-layer network recovery mechanism a chance to recover the fault.The MPC network has two link types:Unprotected STM-16 and STM-64 In the case of the unprotected link, no such hold-off timer is required. As soon as the fault is detected, the network recovery mechanism is triggered (IP routing and MPLS TE Fast Reroute).Protected STM-1 links (by SDH) between some Level 2 and Level 1 POPs Conversely, in the case of the protected STM-1 links, it is desirable to wait for a period of time before triggering MPLS TE Fast Reroute. This way, in case of link failure, the SDH layer tries to recover the fault. If, after some timer X has elapsed, the fault is not recovered, this means that the SDH layer could not recover the affected resource. The inability of the SDH layer to recover the affected link can be because of an SDH equipment failure or because the fault is outside the SDH layer protection scope (for example, in the case of a router or router interface failure). TK determined that the SDH recovery time in its network was bounded to 80 ms (based on its SDH ring size, number of Add/Drop Multiplexers (ADMs), and so on). Hence, TK decided to set the hold-off timer to 100 ms.
NoteThe activation of such a timer has the following consequence: In case of a router interface failure or a router failure, the MPLS TE Fast Reroute time is increased by 100 ms.On a Cisco router, the hold-off timer can be configured via the use of the carrier-delay command (configured on each interface), as shown in Example 4-12.
Example 4-12. Configuration of Carrier Delay
interface pos3/0
carrier-delay ms x (where x is the timer value)
!
Failure of a PE-PSTN Router
The case of a PE-PSTN failure is shown in Figure 4-38.
Figure 4-38. PE-PSTN Node Failure
[View full size image]
