Definitive MPLS Network Designs [Electronic resources] نسخه متنی

Traffic Engineering

The notion of traffic engineering has existed since the first networks were invented. It relates to the art of making efficient use of network resources given a network topology and a set of traffic flows. In other words, the fundamental objective of traffic engineering is to route the traffic so as to avoid network congestion and increase the network's ability to absorb the maximum amount of traffic. To meet such an objective, you can use several traffic engineering techniques, one of which is MPLS-based traffic engineering.

The best way to introduce MPLS Traffic Engineering (called simply MPLS TE in this section) is to start with the well-known illustration called "the fish problem."

IP routing relies on the fundamental concept of destination-based routing. As shown in Figure 2-11, as soon as the two traffic flows originated by routers R1 and R6 towards R5 reach router R2, they follow the same IGP shortest path because IP packets are routed by default based on their destination IP address. In this example, the two flows follow the path R2R3R4R5. If the sum of traffic from these two flows exceeds the network capacity along this path, this inevitably leads to network congestion. This congestion can be avoided by routing some of the traffic along the other available path (R2R7R8R4, where some spare capacity is available).

Figure 2-11. The Fish Problem

[View full size image]

In such a simple network, it is easy to adjust the IGP metrics so as to load-balance the traffic across the two available paths (which normally must have equal cost). However, networks are obviously significantly more complex, and IGP metric optimization is not that simple and does not always provide the level of granularity required. That being said, this is one of the possibilities for traffic-engineering an IP network. Another option is to employ MPLS Traffic Engineering, which provides a rich set of features with a very high granularity to efficiently traffic-engineer an MPLS/IP network.

This section first reviews the set of constraints (also called attributes) of a TE LSP. Then the method for computing the TE LSP path obeying the set of constraints is discussed, followed by an overview of the signaling aspects to set up a TE LSP. Finally, you will see the method used to route traffic onto TE LSPs.

MPLS Traffic Engineering Components

The fundamental idea of MPLS TE is to use a Traffic-Engineered Label-Switched Path (TE LSP or tunnel) to forward the traffic across the network by taking into account a set of constraints and the network topology and resources available with the objective of making efficient use of the network.

In short, the TE LSP attributes (constraints) determine the desired characteristics of the LSP (between its source and destination).

The traffic engineering LSP attributes are

Destination

Bandwidth

Affinities

Preemption

Protection by Fast Reroute

Optimized metric

Destination

The source of the TE LSP is the headend router where the TE LSP is configured, whereas its destination must be explicitly configured. Note that the source of a TE LSP must also be explicitly specified in case the TE LSP path is computed by some offline tool.

Bandwidth

One of the attributes of a TE LSP is obviously the bandwidth required for the TE LSP. Several methods can be used to estimate the bandwidth requirement. The most obvious way is to obtain a traffic matrix between the routers involved in a mesh of TE LSPs. This can be achieved by [MPLS-MGT] for more details on network management solutions). It is also worth mentioning that some more recent inferring techniques rely on the measurement of link utilization to compute the traffic matrix. However, such methods usually just provide estimates whose accuracy is variable and that vary greatly with the network topology.

The traffic flow pattern between two points is rarely a constant and is usually a function of the time of day, not to mention the traffic growth triggered by the introduction of new services in the network or just an accrued use of existing services. Hence, it is the responsibility of the network administrator to determine the bandwidth requirement between two points and how often it should be reevaluated. You can adopt a very conservative approach by considering the traffic peak, X percent of the peak or averaged bandwidth values. After you determine the bandwidth requirement, you can apply an over/underbooking ratio, depending on the overall objectives. Another approach consists of relying on the routers to compute the required bandwidth based on the observed traffic sent to a particular TE LSP. The TE LSP is set up with a predetermined value (which can be 0), and the router keeps monitoring the amount of traffic sent to the destination on the TE LSP. On a Cisco router, such a capability is called autobandwidth. It provides high flexibility by means of several configurable parameters: the sampling frequency (the frequency at which the bandwidth is sampled), the LSP resize frequency, and the minimum and maximum values that a TE LSP size can take. For example, the router can sample the amount of traffic sent every 30 minutes, select the peak value from among a set of samples, and resize the TE LSP no more frequently than once a day with the constraint of always being between 30 Mbps (the minimum value) and 200 Mbps (the maximum value). This provides high flexibility for the network administrator, who can determine the most appropriate trade-off between reactiveness (adapting the demand to the actual bandwidth requirement) and network stability (how often a TE LSP should be resized).

Affinities

A 32-bit flag field that must match the set of links a TE LSP traverses represents affinities. In a nutshell, this can be seen as a coloring scheme (with up to 32 colors). Each link of the network can also have up to 32 colors. It might be desirable in certain circumstances to ensure that a TE LSP exclusively traverses links of specified colors. For the sake of illustration, consider a network with a mix of terrestrial and satellite links. They mainly differ by their propagation delays (which are significantly higher for the satellite links). Hence, the network administrator may decide to color those links (by setting up a specific bit of the 32-bit affinity field). For a TE LSP that carries sensitive traffic for which a short propagation delay is desired, the constraint of avoiding links marked with this specific color can be enforced. Conversely, it is also possible to impose the constraint of selecting links that have a specific color. Any combination is possible, offering a high degree of flexibility.

Preemption

[PREEMPT]. Note that a low preemption number indicates a high preemption (preemption 0 corresponds to the highest priority).

A network administrator can have different motivations for using multiple preemptions in the network. For the sake of illustration, we can mention two typical uses of multipreemption schemes. First, you can ensure that the most important/critical TE LSPs take precedence over other less-important and less-critical TE LSPs in case of resource contention. (For example, voice TE LSPs should be able to preempt data LSPs in case of contention provoked by unexpected traffic growth or network failure.) Second, this lets you circumvent the effect of bandwidth fragmentation in distributed path computation environments. (When the bandwidth is fragmented, it is always more challenging to find a path for larger TE LSPs; hence, they can be configured with a higher priority.)

It is worth noting that because multiple preemptions are available, this implies that an announcement of the available bandwidth on a per-preemption basis should occur. In other words, for each pool of bandwidth, a set of eight available bandwidth values is advertised. For example, consider a link L with 100 Mbps of reservable bandwidth traversed by three TE LSPs:

T1 with a bandwidth of 10 Mbps and a preemption of 1

T2 with a bandwidth of 20 Mbps and a preemption of 3

T3 with a bandwidth of 15 Mbps and a preemption of 5

If you make the assumption of a single pool of bandwidth, the corresponding router advertises the following set of available bandwidth for the link:

Available bandwidth for preemption 0 = 100 Mbps

Available bandwidth for preemption 1 = 90 Mbps

Available bandwidth for preemption 2 = 90 Mbps

Available bandwidth for preemption 3 = 70 Mbps

Available bandwidth for preemption 4 = 70 Mbps

Available bandwidth for preemption 5 = 55 Mbps

Available bandwidth for preemption 6 = 55 Mbps

Available bandwidth for preemption 7 = 55 Mbps

For example, 55 Mbps of bandwidth is available for the TE LSPs having a preemption of 5, 6, or 7, whereas TE LSPs with a preemption of 3 or 4 can have up to 70 Mbps and a TE LSP with a preemption of 0 can get 100 Mbps. Of course, in this example if a headend router decides to signal a TE LSP of preemption 3 for 65 Mbps, the TE LSP T3 is preempted to accommodate the higher-priority TE LSP.

Protection by Fast Reroute

MPLS Traffic Engineering provides an efficient local protection scheme called Fast Reroute to quickly reroute TE LSPs to a presignaled backup tunnel within tens of milliseconds (see the "Core Network Availability" section). Such a local protection scheme can be used for some TE LSPs requiring fast rerouting on network failure and is signaled as a TE LSP attribute. In other words, it is possible when setting up a TE LSP to explicitly require the protection by Fast Reroute for the TE LSP whenever Fast Reroute is available on the traversed router. This lets you define different classes of recovery in which some TE LSPs are rerouted according to the normal MPLS Traffic Engineering procedures (as discussed later in this section) and other TE LSPs benefit from fast recovery by means of Fast Reroute. Some additional parameters are detailed in the "Core Network Availability" section.

Optimized Metric

The notion of shortest path is always related to a particular metric. Typically, in an IP network, each link has a metric, and the shortest path is the path such that the sum of the link metrics along the path is minimal.

MPLS TE also uses metrics to pick the shortest path for a tunnel that satisfies the constraints specified. MPLS TE has introduced its own metric. When MPLS TE is configured on a link, the router can flood two metrics for a particular link: the IGP and TE metrics (which may or may not be the same).

To illustrate a potential application, consider the case of links having different bandwidth and propagation delay characteristics. Given this, it might be advantageous to reflect each property by means of a different metric. The IGP metric could, for instance, reflect the link bandwidth, whereas the TE metric would be a function of the propagation delay. Consequently, the IETF specification named [SECOND-METRIC] proposed to have the ability to specify the metric that should be optimized when computing the shortest TE LSP path. On a Cisco router, when a TE LSP is configured, the metric to optimize can also be specified. For instance, the shortest path for a TE LSP carrying voice traffic could be the path offering the shortest propagation delay. Conversely, the path computed for TE LSPs carrying large amounts of data traffic would be determined with the objective of traversing high-speed links. Note that the current Constraint Shortest Path First (CSPF) implementation tries to compute the shortest path based on one of the two metrics (IGP or TE). Dual-metric optimization is an NP-complete problem that would make the path computation significantly more CPU-intensive.

Hierarchy of Attributes (Set of Ordered Path Option)

As you saw previously, when configuring a TE LSP, a set of attributes are specified that must be satisfied when computing the TE LSP path. But what happens if no path satisfying the set of constraints can be found? The answer is straightforward: the TE LSP is not set up. The solution is to relax the constraint that cannot be satisfied and retry at regular intervals to see whether a path satisfying the preferred set of attributes can be found.

The Cisco MPLS TE implementation provides such functionality in a very granular fashion. For each TE LSP, the network administrator can specify multiple sets of attributes with an order or preference. For instance, the most preferred set of attributes could be to get 50 Mbps of bandwidth, a specific affinity constraint (such as avoid all the "red" links), and a preemption of 5. If not all of these constraints can be specified, the user can decide to simply relax the affinity constraint or reduce the bandwidth, for instance. This provides great flexibility to ensure that the TE LSP can always be established along with an important control on the hierarchy among the set of constraints. (For instance, first relax the affinity constraint and then reduce the bandwidth as a last resort, or try to reduce the bandwidth first.)

A good common practice consists of always explicitly configuring a fallback option with no constraint. In such a case the TE LSP path is identical to the IGP shortest path, and the TE LSP could always be set up, provided that some connectivity exists between the source and the destination. Furthermore, the multiple sets of attributes can be ordered by preference, with the ability for the headend router to regularly try to reoptimize a TE LSP to find a path that satisfies a preferred set of attributes. For instance, suppose that a TE LSP is configured with 10 Mbps of bandwidth with a fallback option of 1 Mbps. The TE LSP is successfully set up along the path R1R2R3R4R5. If the link R3R4 fails and no other path offering a bandwidth of 10 Mbps can be found in the network, the headend router tries to find a path with 1 Mbps of available bandwidth. Then, when a reoptimization is triggered, the headend router first tries to evaluate whether a path for 10 Mbps can be found.

TE LSP Path Computation

There are two options for computing a TE LSP path: offline and online path computation. With offline path computation, an offline tool is used to compute the path of each TE LSP, taking into account the constraints and attributes (such as bandwidth, affinities, optimized metrics, preemption, and so on), the network topology, and resources. Because the computation is simultaneously performed for all the TE LSPs in the network, offline tools try to achieve a global network optimization with multiple criteria such as maximum link utilization, minimized propagation delay, and so on, and with the objective of maximizing the amount of traffic the network can carry. This can be achieved thanks to the global view of the network characteristics and traffic demands. Then the TE LSP paths are downloaded on each corresponding headend router.

The second path computation method relies on distributed path computation, whereby each router is responsible for computing the path(s) of the TE LSP(s) it is the headend for. No central server computes the TE LSP's path in the network. A very well-known algorithm for computing the TE LSPs is the CSPF algorithm(s). Although a multitude of variants exist, it is worth describing its most commonly used version variant. CSPF computes the shortest path(s) (in a similar fashion as the SPF algorithm used by link-state protocols such as OSPF and IS-IS) that satisfies the set of specified constraints. For instance, all the links that do not satisfy the bandwidth requirement or the affinity constraint (if present) for the TE LSP in question are pruned from the network topology. Then the Dijkstra algorithm computes the shortest path over the resulting subgraph.

We could probably devote an entire book to the respective strengths and weaknesses of each approach. In a nutshell, online path computation is more dynamic, more reactive to network and traffic changes, and more robust (it does not rely on a single centralized server) because of its distributed nature. It also yields less-optimal paths. In contrast, the offline approach usually allows for a higher degree of optimality at the price of less dynamicity, scalability, and increased management overhead.

MPLS TE IGP Routing Extensions

The path computation module needs to learn the available bandwidth on each link to compute a TE LSP path. This is achieved via specific routing protocol extensions, as defined in [OSPF-TE] and [ISIS-TE]. Each link is originally configured with reservable bandwidth (which may or may not be equal to the actual link speed if the network administrator is willing to make any under/oversubscription). As TE LSPs are set up and torn down, the amount of reserved bandwidth varies on each link and is reflected by the IGP. Note that the available bandwidth is provided for each preemption level.

Of course, it would be undesirable to flood a new IGP Link-State Advertisement (LSA) each time the available bandwidth changes. Hence, a nonlinear threshold mechanism is used such that small changes do not trigger the flooding of an IGP LSA update. The downside of such an approach is the potential inaccuracy between the available bandwidth flooded by the IGP and the actual reservation state. Consequently, a headend router may compute a path even though the required bandwidth is not actually available at some hop. In this case, the router that can't accommodate this request rejects the TE LSP set up and immediately triggers the flooding of an IGP LSA update to inform the headend router (and all the routers in the network) of the actual reservation state. The headend router in turn computes a new TE LSP path, this time taking into account up-to-date information.

The threshold scheme is made nonlinear to ensure more frequent updates (closer thresholds) as the available bandwidth for the link gets closer to 0. Ensuring more accurate bandwidth reservation states allows the operator to reduce the risk of unsuccessful TE LSP setup. In practice, such a scheme works extremely well and provides a very efficient trade-off between IGP flooding frequency and up-to-date reservation state dissemination.

Note also that the receipt of an IGP LSA reflecting a bandwidth reservation state never triggers a routing table computation. It is worth mentioning that even in large networks the IGP overhead related to the announcement of the reservable bandwidth along with other TE-related links attributes does not affect the IGP scalability.

Signaling of a Traffic Engineering LSP

[RSVP-TE]). RSVP-TE uses the RSVP messages defined in [RSVP] to set up, maintain (refresh), signal an error condition, and tear down a TE LSP. These messages are Path, Resv, Path Error, and Resv Error. Each message contains a variable set of objects. As mentioned earlier, several new objects in addition to the existing objects defined for IPv4 flows have been specified for use by MPLS TE. Those objects are related to TE LSP attributes such as the computed path (also called ERO), bandwidth, preemption, Fast Reroute requirements, and affinities, to mention a few.

When initiating a TE LSP setup, the headend router starts sending an RSVP Path message that specifies the TE LSP attributes. The Path message is processed at each hop up to the final destination. Then, on the reverse path, RSVP Resv messages are sent to the headend router. Each hop along the path checks whether the TE LSP constraints can effectively be satisfied. If they can, each router sends a Resv message to its upstream neighbor along the TE LSP path to confirm the successful setup. Note that it also provides the label to be used for the corresponding TE LSP. This messaging sequence is shown in Figure 2-12.

Figure 2-12. The Steps of Setting Up a TE LSP

[REFRESH-REDUCTION]. The basic principle consists of sending a limited amount of information to refresh a TE LSP state instead of resending the complete Path and Resv messages. Moreover, [REFRESH-REDUCTION].

Handling errors is obviously part of the RSVP signaling set of procedures. Hence, when a TE LSP experiences a failure that triggers a teardown (and consequently the removal of the corresponding control and data plane states), a set of RSVP messages is sent both upstream and downstream to the triggering router. Such conditions can be provoked by insufficient resources upon signaling, network element failures (link, node), preemption of a TE LSP by a higherpriority TE LSP, and so on.

[GMPLS] but generally are undesirable and unnecessary in the context of packet networks. Indeed, they require finding a path that satisfies the constraints in both directions, which may not always be possible. Such an additional constraint may also impose a nonoptimal path. Unidirectional TE LSPs are much more suitable to packet networks and are more in line with the fundamental asymmetrical nature of data networks and IP routing.

No routing adjacency is established over TE LSPs. In contrast to other technologies (such as ATM, which also is used for traffic engineering purposes), TE LSPs confer to MPLS TE a significantly higher scalability.

Routing onto a Traffic Engineering LSP

After the TE LSP has been successfully set up, the last step is to determine the set of traffic that must be carried onto the TE LSP. The first method to route traffic onto a TE LSP is to configure a static route where the static route points to a TE LSP instead of a "regular" physical interface. Note that recursive static routes can also be configured. In this case, a static route points to an IP address, which for instance can be the loopback interface address of a BGP next hop. Consequently, each IPv4 or VPNv4 (in the context of Layer 3 MPLS VPNs) route announced by the BGP next hop would use the TE LSP without having to configure a static route per announced prefix.

Another mechanism known as Autoroute on Cisco routers (other router vendors have proposed similar mechanisms) allows the headend router to automatically take into account the TE LSP in its routing table computation. It always prefers a TE LSP to the IGP shortest path to reach the TE LSP destination and any destination downstream to the TE LSP's tail-end. Detailed references for the Autoroute feature can be found in [AUTOROUTE].

Solving the Fish Problem

Going back to the previous example illustrated in Figure 2-11, consider the following set of assumptions. All the links have a metric of 1 and a capacity of 10 Mbps. The flows' bandwidth requirements, (R1,R5) and (R6,R5), are 8 Mbps and 7 Mbps, respectively. Suppose that R1 first computes a path for its TE LSP of 8 Mbps. It selects the shortest path satisfying the set of constraints (in this simple example, the only constraint is bandwidth), which is R1R2R3R4R5 (the shortest path cost satisfying the bandwidth constraint). After the TE LSP has been successfully set up, the IGP reflects the available bandwidth on each link. Then R6 computes its TE LSP path requiring 7 Mbps of bandwidth and determines that the shortest path offering 7 Mbps of available bandwidth is R6R2R7R8R4R5. This lets you avoid congestion on the links R2R3 and R3R4, which cannot accommodate the sum of the two traffic flows. This is illustrated in Figure 2-13.

Figure 2-13. A Path Computed by the Headend Router R6 That Satisfies the Bandwidth Requirement

[View full size image]

TE LSP Deployment Scenarios

Figure 2-14 illustrates different scenarios involving TE LSPs.

Figure 2-14. Various Deployment Scenarios Involving MPLS TE

[View full size image]

The following explains the two deployment scenarios shown in Figure 2-14:

IP backbone traffic engineered with MPLS TE In Figure 2-14, consider an IP packet sent from R0 to R7, and observe the different steps that occur in the forwarding path. The IP packet is first IP-routed from R0 to R2, where it is forwarded to T1, a TE LSP from R3 to R6. In this example, a label provided by RSVP-TE is pushed by R2, the headend of the T1 TE LSP. Then the packet is label-switched according to the labels distributed by RSVP-TE to the penultimate hop, R5, which removes the RSVP-TE label (an operation called Penultimate Hop Popping (PHP)). The IP packet then is finally IP-routed to its final destination, R7.

Layer 3 MPLS VPN backbone traffic engineered with MPLS TE This scenario is slightly more complicated. In addition to the labels usually required by MPLS VPN (the LDP and BGP labels), an additional label corresponding to the TE LSP is pushed onto the label stack as the traffic is forwarded onto the TE LSP by the TE LSP headend router. In Figure 2-14, an IP packet is sent by the CE router. Upon receiving the IP packet, the PE router determines the VRF and pushes a two-label stack. This label stack corresponds to the BGP label provided by the PE router, which advertises the destination prefix for the VPN in question plus the LDP label to reach the destination PE router. The packet is then label-switched until it reaches P1. At this point, the headend router P1 pushes an additional label corresponding to the TE LSP. It is worth mentioning that the LDP label is swapped and the remote LDP peer P3 provides the corresponding label. Why is an LDP session required between the headend and the tail-end of the TE LSP (P1 and P3 in this example)? As you saw in the previous example, a PHP operation is performed by the penultimate hop of the TE LSP. Hence, without an LSP session between the headend and tail-end of the TE LSP, the TE LSP tail-end router (P3 in this example) would receive a packet with an unknown BGP label instead of receiving an LDP label and would just drop the packet (or forward it using what it believes to be the correct forwarding entry, thus causing a traffic black hole).

Note

In the case of a TE LSP established between PE routers, there is no need for any LDP session. Hence, two labels are imposed on the IP packets received from the CE routersthe BGP label and the RSVP-TE label.

Reoptimizing a Traffic Engineering LSP

Network state keeps changing. Links and nodes fail and recover; new TE LSPs are set up while others may be torn down. Consequently, for any TE LSP, a more optimal path may appear in the network. It is then highly desirable to detect the existence of such a path and reoptimize a TE LSP along a better path when it becomes available. Consider the example shown in Figure 2-15.

Figure 2-15. Reoptimizing a TE LSP

[View full size image]

In Figure 2-15, all the links have a cost of 1 and an initial reservable bandwidth of 10 Mbps, except the link R4R5, which has 15 Mbps of reservable bandwidth. Suppose that the first TE LSP to be signaled is T1 between R1 and R5 (for 3 Mbps), which follows the shortest path, obeying the bandwidth constraint, of R1R2R3R4R5. Immediately following this event, R6 signals the TE LSP T2 (for 8 Mbps), which follows the shortest path offering 8 Mbps of bandwidth, which is R6R2R7R8R4R5. Hence, at time t0, both T1 and T2 are up and running. At time t1, the TE LSP T1 is torn down, which frees up 3 Mbps of bandwidth along the path R1R2R3R4R5. When a reoptimization evaluation process is triggered by R6, it determines that a more optimal (shortest) path exists between R6 and R5 (the path R6R2R3R4R5), and it reroutes the TE LSP along this more optimal path. This illustrates the concept of reoptimization. There are several important aspects to highlight:

Nondisruptive reoptimization A key property of MPLS TE is its ability to reoptimize a TE LSP along a more optimal path without any traffic disruption. Hence, the headend router should not tear down the "old" LSP and then reestablish it along a more optimal path. Instead, the headend router first establishes the new TE LSP (which belongs to the same session as the old TE LSP). As soon as that new TE LSP is successfully set up, the old TE LSP is torn down. This is known as the make-before-break mechanism, a property that was not really available (implemented) with other Layer 2 protocols such as ATM.

Avoid double booking If you carefully observe Figure 2-15, you will notice that after T1 has been torn down, the available bandwidth on the link R4R5 is 7 Mbps, which is not enough to accommodate the second occurrence/LSP of T2 (along the more optimal path). So how can R6 set up the LSP along the path R6R2R3R4R5 without first tearing down the TE LSP, because the make-before-break procedure appears to have both LSPs active at the same time? The answer is that T2 is reoptimized with the option of sharing the bandwidth with the old LSP of T2. In other words, when the new LSP of T2 is signaled, the router R4 sees that it shares the bandwidth with the old LSP (thanks to RSVP's Share Explicit option). Hence, no double booking occurs (R4's call admission control does not count the bandwidth twice). Similarly, when computing the path for the reoptimized TE LSP, the headend (R6 in this example) knows that the available bandwidth to reoptimize T2 is the current network reservation state plus the bandwidth that will be freed up by the current LSP for T2.

An important aspect of any MPLS TE network design relates to the reoptimization triggers. How often and when should a TE LSP be evaluated for reoptimization? A Cisco router has several reoptimization triggers:

Manual reoptimization A command is issued on the headend router that triggers a reoptimization evaluation for each TE LSP the router is the headend for. If a better path is found, the TE LSP is reoptimized according to the make-before-break procedure.

Timer-based reoptimization A reoptimization timer is configured on the headend router. Upon expiration, the headend tries to reoptimize each TE LSP it is the headend for.

Event-driven reoptimization In some cases, it may be desirable to trigger a reoptimization upon the occurrence of a particular event, such as the restoration of a link in the network. Indeed, if a link is restored in the network, some TE LSPs may benefit from that new link to follow a more optimal path. When configured, each time a new link is advertised as newly operational, this triggers the evaluation of any potential reoptimization. It is worth noting that this should be handled with some care to avoid network instability. Let's consider the case of an unstable link. This would have the effect of constantly attracting new TE LSPs, which would then fail and be reoptimized again, creating some very undesirable network instability and consequently traffic disruption. This explains why such a trigger should always be used in conjunction with a dampening mechanism at the interface or IGP level to avoid network instabilities in case of unstable links or routers.

MPLS Traffic Engineering and Load Balancing

Load balancing is undoubtedly a key aspect of traffic engineering. It refers to the ability to share the traffic load between two routers across multiple paths. In IP routing, those N paths must have equal costs (except in the case of Enhanced Interior Gateway Routing Protocol [EIGRP]), and the share of traffic between those N paths is also equal. More accurately, two methods exist to perform IP load balancingper-packet and per-destination. With per-packet load balancing, the load-balancing algorithm leads to a strict equal share across the N path because the traffic (on a per-packet basis) is balanced in a round-robin fashion. With per-destination load balancing, a hash algorithm on several IP fields (such as the source and destination IP addresses) is used to ensure that all the packets that belong to the same flow always follow the same path (thus avoiding reordering of packets that belong to the same flow). Consequently, the load between the N paths may not be exactly equal.

The reason why traffic can be load-shared only between equal-cost paths in IP routing is to avoid the formation of routing loops. Indeed, without any further IGP routing enhancements, if the packets were distributed among unequal-cost paths, some packets would be sent back to some routers along their path.

In contrast, MPLS TE offers a higher degree of flexibility. First, a headend router can set up multiple TE LSPs to a particular destination, which may follow paths with unequal costs. This does not introduce any routing loops because packets are label-switched along all the TE LSP paths in accordance with the path explicitly signaled at LSP setup time. Second, the multiple TE LSPs may have different characteristics. For instance, if two TE LSPs are established between R1 and R2, each having a bandwidth of 10 Mbps and 20 Mbps, the headend router (R1) shares the load between the two LSPs proportional to their respective bandwidth (twice more packets are sent to the TE LSP with 20 Mbps of bandwidth). Note that some router implementations allow the operator to override this sharing ratio by configuration. It is worth mentioning that the usual traffic load-balancing techniques available for IP (per-packet and per-destination) equally apply to MPLS TE.

MPLS Traffic Engineering Forwarding Adjacency

By default, a headend router does not advertise any of its TE LSPs within its self-generated LSA/LSPs. This implies that its TE LSP(s) do not influence the routing decisions of other routers. Under some circumstances it is desirable to adopt different behavior and allow a router to advertise in its LSA/LSP the existence of a TE LSP. In other words, the headend advertises a TE LSP as a "physical" link even though no routing adjacency has been established over the TE LSP. Consequently, any router receiving the headend router's LSA/LSP takes the TE LSP into account in its SPF calculation. In this case, we usually say that the TE LSP is advertised as a forwarding adjacency (FA). This is a TE LSP attribute configured just like any other LSP attributes on the headend router. Figure 2-16 illustrates forwarding adjacency.

Figure 2-16. Forwarding Adjacencies

Core Network Availability" section that a local protection mechanism such as MPLS TE Fast Reroute would recover T1 within 50 ms. Hence, if T1 reflects the failure of the TE LSP without waiting, this would lead to advertising two IGP LSA/LSPs in a very short period of time. This would trigger two consecutive routing table updates on each router in the network. This explains why it may be desirable to wait for a period of time before advertising the FA as down. The amount of time usually varies according to the network design and network recovery mechanism that is in place.

Note

To consider a link during SPF, a router always checks that the link is advertised in the LSA/LSP of both ends. For instance, in Figure 2-16, when E1 computes its SPF, before considering the link R1R9 (TE LSP configured as a forwarding adjacency), it checks whether the link is advertised in both the R1 and R9 LSA/LSPs. This is known as the double connectivity check. Hence, it requires that a TE LSP is also configured between R9 and R1 as a forwarding adjacency because TE LSPs are unidirectional.

In the case of a large number of TE LSPs configured on a headend router as forwarding adjacencies, this may significantly increase the size of the router's LSA/LSP. That being said, this is not a real issue in practice.

There are two ways to advertise a TE LSP as a forwarding adjacency:

As an IP link In this case, the TE LSP is seen by other routers as a regular IP link, without any traffic engineering capability.

As a traffic engineering link This implies that the headend router must advertise the TE LSP as a link with traffic engineering characteristics such as reservable bandwidth, affinities, and so on.

In the first case, the other routers may use the TE LSP in their SPF computation to route their IP traffic. In the second case, if the TE LSP is advertised as a TE link, the other routers could also take the forwarding adjacency into account in their CSPF calculation to route their own TE LSP because they see the forwarding adjacency as a regular TE link. This introduces the notion of a hierarchical TE LSPthat is, a TE LSP routed within other TE LSPs.

Forwarding adjacencies may be useful in some network designs. A typical example is when load balancing must be achieved between PE routers. Consider the typical case of various POPs interconnected via long-distance links and made up of a set of PE routers dual-attached to two P routers. Between each pair of PE routers are several IGP paths that can all be used to load-balance the traffic across those paths if and only if they have an equal IGP path. Note that this applies to both the case of IP (without MPLS) and MPLS Traffic Engineering (unless exactly one TE LSP is set up per available path, which is usually hard to achieve). Thus, one solution consists of computing the set of link metrics such that all the paths between each pair of PE routers have an equal cost. However, this is virtually impossible to achieve and is likely to be inefficient in terms of bandwidth usage if the network is made up of links with disparate link bandwidths.

Another solution relies on the use of forwarding adjacencies. As shown in Figure 2-16, a full mesh of TE LSPs is set up between the P routers, which are configured as FA and announced as FA (links) with a fixed metric. This way, every PE router (such as E2 in Figure 2-16) sees every other PE router at an equal IGP path cost via every possible path, provided that the link costs between PE routers and P routers are themselves equal.

Of course, such a design requires some analysis in terms of parameter settings (in particular, for the timers related to the TE LSP's liveness). It also increases the size of the IS-IS LSP and OSPF LSA (Type 1) because each headend router now advertises each TE LSP it is the headend for as a physical link. This may lead to an increase in each IS-IS LSP/OSPF LSA size and consequently the Link-State Database (LSDB) size. Moreover, some careful analysis must be done to study the traffic routing in the network because the network view of each router no longer reflects the actual network topology.

Automatic Meshing of a Mesh of TE LSPs

A quite common deployment scheme of MPLS Traffic Engineering, which is illustrated in several case studies in this book, consists of setting up a mesh of TE LSPs between a set of routers. Depending on the network size and the decision to deploy MPLS TE between the core or edge routers, the number of routers involved in a mesh can range from a few routers to tens or even a few hundred. Furthermore, because TE LSPs are unidirectional, a mesh of N routers implies the configuration of N 1 TE LSPs per headend. The total number of TE LSPs in the network is then N * (N 1) TE LSPs. For instance, a mesh of 50 routers requires the configuration of 2,450 TE LSPs. Such a configuration task can be eased by using scripts but is still undoubtedly cumbersome and subject to configuration errors. Moreover, adding a new member to the mesh requires not only the configuration of N new TE LSPs (one to each existing router of the mesh) but also one additional TE LSP on each of the N routers terminating on the new member of the mesh. Consequently, having some mechanisms automating the creation of meshes of TE LSP is extremely useful. In a nutshell, such functionality consists of several components:

[OSPF-TE-CAPS] and [ISIS-TE-CAPS] that allow a router to announce its mesh group membership. (As you will see in several case studies, having multiple TE meshes might be required. Hence, the generalized notion of mesh groups has been introduced.) Thus, a router first announces the set of mesh groups it belongs to (this set may obviously be reduced to one). Note also that mesh groups may or may not overlap. In other words, the set of routers that belong to mesh group M1 may fully or partially intersect with the set of routers that belong to mesh group M2.

TE template For each mesh group a router belongs to, the network administrator configures a set of attributes that apply to all the TE LSPs of the specific mesh group. Note that because bandwidth may vary for each destination, an elegant solution consists of using the auto-bandwidth mechanism described earlier.

Automatic TE LSP setup As soon as a router has discovered all the routers that participate in the mesh group(s) it belongs to, it starts establishing the required TE LSPs.