MPLS VPN Security and Scalability
While designing its MPLS VPN service, Globenet recognized that security was a fundamental requirement that must be met to protect its network infrastructure and its customers' network facilities.It was clear that the decision to use MPLS as the service-enabling technology provided a number of inherent security capabilities that were built into the technology from its inception:IP address space and routing separation The use of a route distinguisher enables uniqueness of IP addresses in the core of the Globenet network, and the use of VRFs and routing contexts provides routing separation.No visibility of the core network This is enabled using the command no mpls ip propagate-ttl forwarded.Resistance to MPLS label spoofing The default behavior of an incoming IP interface (such as the PE-CE links) is to drop packets that carry MPLS labels.
Although these capabilities provide basic security protection, intrusion or DoS attacks may be possible in a number of other areas. For example, even though the core IP addressing is hidden from attached customers, the same cannot be guaranteed for the subnet used on the PE-CE link. Visibility of this link address might allow an attached customer to intrude or perform a DoS attack on the PE router. For this reason, Globenet does not redistribute the PE-CE link addresses at the CE router into the IGP of a managed customer. For unmanaged customers, the default configuration is to disable the redistribution of connected interfaces (such as the PE-CE link) into MP-BGP for advertisement to other sites of the VPN. For customers that want to receive these addresses, Globenet makes sure that their inbound packet filters at the PE routers are updated to deny the local PE router addresses. (Filters are discussed later in this section.)To help identify all the areas that should be addressed in the MPLS network design, Globenet broke the problem into the following three functional areas:Operational securityControl plane protectionData plane protection
VPN Operational Security
Operational security may be defined as the steps necessary to protect unwelcome or unauthorized access to the Globenet network resources (such as routers and servers) and the blocking of service exploitation (such as DoS attacks).Globenet evaluated all the services and protocols it uses at its routers. It noticed that a number of the default configuration settings were enabled for services or protocols it did not actually use in production. Using this information, Globenet built a configuration template to disable all the services and protocols that were deemed unnecessary. This template is shown in Example 5-6.
Example 5-6. Configuration Template to Disable Unused Services and Protocols
NoteDisabling IP unreachables is advisable, because a worm may attempt to send packets to random IP addresses, some of which may not exist. When that occurs, the router replies with an "ICMP unreachable" packet. In some cases, replying to a large number of requests with invalid IP addresses may result in degradation of the router's performance. An alternative option is to restrict the number of IP unreachables using the command ip icmp rate-limit unreachable.A number of other services are enabled on all routers. These are used to enhance troubleshooting effectiveness and fault isolation, restrict access to the routers, and provide authentication/authorization of users. The configuration template shown in Example 5-7 enables these services.
no service pad
no ip source-route
no ip bootp server
!
interface interface
no ip redirects
no ip directed broadcast
no ip unreachables
no ip proxy-arp
!
no ip http server
!
no cdp run
Example 5-7. Configuration Template to Enable Various Security Services
[MPLS-Security] provides a more detailed analysis of security for MPLS-based VPNs and also a set of best-practice guidelines.NoteSSH provides an encrypted channel between a remote console and a network router. If SSH is enabled on a router, Telnet access can be disabled to force all administrative sessions to run over the encrypted channel that SSH provides. In this case, attackers cannot find open Telnet ports.
service password-encryption
!
aaa new-model
aaa authentication login default group tacacs+
enable secret password
!
ip ftp username username
ip ftp password password
ip domain-name Globenet.com
exception protocol ftp
exception dump ftp-server-IP-address
!
logging source-interface loopback0
logging syslog-server-IP-address
access-list 20 remark SNMP ACL
access-list 20 permit snmp-host
access-list 20 deny any log
!
tacacs-server host Globenet-tacacs-server-IP-address
tacacs-server key key
!
snmp-server community community RO 20
line vty 0 4
transport input ssh
login tacacs
VPN Control Plane Protection
Control plane security may be defined as the steps necessary to protect and authenticate the distribution of routing and forwarding information within the Globenet network.Globenet deploys a number of control plane protection mechanisms. You saw in Chapter 3, "Interexchange Carrier Design Study," the use of the maximum routes command within VRFs. This command is enabled on all Globenet PE routers. It is set to a customer-specific value with a warning threshold set to 75 percent of the route maximum. In addition to this basic protection scheme, Globenet also restricts the number of routes at the routing protocol level.You saw in Chapter 4 how to achieve protection from the OSPF protocol in terms of restricting the maximum number of LSAs that a particular process may receive. Globenet uses these protection mechanisms whenever it deploys an OSPF customer. In addition, it uses the maximum-prefix command on a per-session basis for BGP-4 and on a per-process basis for EIGRP.RIPv2 does not require any additional configuration. The Routing Information Protocol (RIP) database will be populated only by routes that are present in the VRF. Because these are restricted based on the maximum routes command, the RIP database population is also limited.Globenet also extensively deploys neighbor authentication for routing protocols. Neighbor authentication allows a receiving router to authenticate the source of the routing update using a shared key that only it and the neighboring router know. Globenet chose to use MD5 authentication so that the authentication key was not carried between routers. MD5 provides the ability to create a message digest by using the key and the message as a hash to MD5. This prevents its routers from receiving unauthorized updates from a routing peer. Globenet also uses this mechanism to verify updates it receives from label distribution peers.Globenet enables routing authentication in three different segments of the network: PE-CE, PE-PE, and PE-P/P-P/P-PE. [MPLS-VPN-Vol2] provides details of how MD5 authentication works with RIPv2, EIGRP, OSPF, and BGP-4, and LDP.
VPN Data Plane Protection
Data plane protection is concerned with protecting packets that are forwarded across the network. Because the majority of Globenet customers use its managed service, Globenet decided to deploy access list filters at the CE routers. These filters protect the Globenet infrastructure, including the ASBRs, P routers, PE routers, and CE routers.The filter is applied at the CE routers on ingress from the attached customer site and performs the actions listed next. The configuration template used to apply this policy is shown in Example 5-8; it does the following:It prevents the redistribution of the PE-CE link subnet into the attached customer site.It prevents ICMP traffic from the CE router toward the PE router except for a designated address (which is the outbound interface address used by the CE router for connectivity with the PE router). This address cannot be reached from the attached customer site and belongs to the Globenet address space.It allows only the BGP-4 protocol (in this case) to run on the PE-CE link. This is achieved by denying routing updates from any other routing protocol.It blocks any traffic addressed toward the Globenet backbone.It allows site-to-site traffic to flow.
Example 5-8. Configuration Template for PE-CE Data Plane Protection
ip access-list extended Globenet-CE-filter
permit icmp host CE-interface-address host PE-interface-address
permit tcp host CE-interface-address host PE-interface-address eq bgp
deny ip any Globenet-backbone-addresses
permit ip any any
Scaling and Convergence of the Layer 3 MPLS VPN Service
Globenet investigated what elements would affect the overall scaling of the MPLS VPN service. Although many factors were identified, the main concerns fell into the following broad categories:Control plane designEdge router capabilities
The control plane design covers all elements of routing and forwarding, including interior routing protocols, exterior routing protocols, label distribution protocols, and so forth. The Layer 3 MPLS VPN service incorporates all these elements and therefore is subject to the scaling limits of each. Many questions and areas of discussion arise when evaluating the scaling properties of each protocol, but in terms of the Layer 3 MPLS VPN service, Globenet was primarily concerned with the following:How do the various protocols interact, and how large can each grow? For example, what factors will increase the network deployment size?Because MP-BGP is used as the major routing distribution protocol in the backbone network, how many sessions can be successfully deployed? Are tools required to increase the protocol's scalability?Because BGP-4 is used at the edge of the network, and MP-BGP is used in the core, do these need to be tuned for optimal performance?How is routing and forwarding convergence affected from customer site to customer site when compared to an intranet built over a Frame Relay network?
Protocol Interaction
Globenet deploys a number of protocols to support its service portfolio, including RSVP, LDP, IS-IS, and MP-BGP. Clearly the interaction between these protocols is an important consideration.IGP-LDP synchronization functionality is deployed (this was discussed in the preceding chapter) to ensure appropriate synchronization between LDP and IS-IS in case of topology changes. Tuning the IS-IS protocol for fast convergence is also deployed. This is described in the section "IS-IS Routing Design."
MP-BGP Scaling Considerations
All the MP-BGP features (and their tuning) that you have seen described in previous chapters are deployed on the Globenet network. These include update groups, route reflectors, path maximum transmission unit (MTU) discovery, hold queue tuning, Selective Packet Discard (SPD) queue optimization, and so on.The route reflector design is also the same model described in previous chapters. It consists of a single level of hierarchy in each autonomous system (rather than a more-complex multihierarchy design, which would be unnecessary given the current amount of routing state for Globenet's Layer 3 MPLS VPN service). PE routers are required to maintain an MP-BGP session with at least two separate route reflectors, and all route reflectors are fully meshed. Any routes that need to cross autonomous system boundaries are held by the ASBRs.In addition to the route reflector design described, Globenet chose to attempt to restrict VPNv4 routes from reaching the parts of its network where no customer sites were attached with interest in the said routes. For example, in the North America region, Globenet has several customers that are local to the California region. Therefore, distribution of their routes to the route reflectors in the East Coast region is unnecessary. This is illustrated in Figure 5-25.
Figure 5-25. Regional Customer Attachment Example
[rt-constrain], to restrict distribution of routes based on extended community attributes.[rt-constrain] provides a new SAFI (RT filter) that is used to advertise the route target extended community attributes used at the PE routers for import/export policy. For example, consider Figure 5-26. Globenet has a customer called ABC Inc. in California and another customer called XYZ Inc. in New York. Each PE router advertises the route target values used by these VPNs to the route reflectors using the RT filter SAFI. Using this information, the route reflectors can filter the advertisement of the routes to unnecessary regions of the network.
Figure 5-26. RT Filtering for ABC Inc. and XYZ Inc.
[View full size image]

Figure 5-27. Regional Customer Attachment with RT Filtering
[View full size image]

Globenet Routing Convergence Strategy
Convergence can be defined as the time taken for routers in a particular routing domain to learn about the complete topology and to recompute an alternative path (assuming that one exists) to a particular destination after a network change has occurred. This process involves the routers adapting to these changes by synchronizing their view of the network with other routers in the same domain. Because Globenet uses a number of protocols that all interact, it was important within the design to define a routing convergence strategy that covered both its backbone and its services' convergence.Broadly speaking, convergence may be split into two subcategories:Convergence of the internal infrastructure This category includes backbone IGP convergence and MP-BGP control plane convergence.Convergence of external services This category includes convergence of routing information between external customer sites, such as between sites using the Layer 3 MPLS VPN service.
Globenet identified that the convergence of its backbone network was dependent on which protection mechanisms it deployed at different layers. The strategy chosen was to combine MPLS fast reroute and fast IGP convergence, as detailed in the "Network Recovery Design" section later in this chapter.
Layer 3 MPLS VPN ServiceRouting Convergence
The Layer 3 MPLS VPN architecture uses the services of BGP (with multiprotocol extensions) to distribute VPN routing information between the edges of the network. BGP was essentially invented to solve a route distribution problem in a very scalable manner and to provide a mechanism to achieve a loop-free routing topology. Because of its distance vector-like behavior, and the features implemented to provide stability, BGP by its very nature does not converge as quickly as a link-state IGP. Interaction between MP-BGP and the IGP can have a significant effect on routing convergence. For this reason, additional implementation timers have been added to speed up the default routing convergence times. These timers may be adjusted to provide an optimal deployment of a Layer 3 MPLS VPN service.From a backbone network perspective, Globenet tunes its IS-IS timers to gain subsecond IGP convergence. However, because the Layer 3 MPLS VPN service relies heavily on the MP-BGP protocol for route distribution, subsecond IGP convergence is insufficient to ensure fast failure detection affecting a set of VPN routes. Instead, next-hop reachability of all routes is checked, by default, every 60 seconds (this is driven in IOS by the "scanner" process and is a configurable timer). If a particular link fails in the backbone network, the IGP can detect this very quickly. However, the MP-BGP process may take up to 60 seconds to determine that a given next hop is no longer available. Therefore, the routes previously learned via that next hop are invalid. For this reason, Globenet chose to deploy the next-hop tracking (NHT) feature on all its PE routers.The NHT feature is enabled by default in the IOS version deployed at the Globenet PE routers. It allows MP-BGP to register next-hop addresses (PE router loopback interface addresses) for MP-BGP routes with the RIB Address Tracking Filter (ATF) feature. This feature provides an efficient routing update notification service and monitors all route changes in the PE router's Routing Information Base (RIB). When a route changes in some way (for example, it becomes unreachable, or the metric changes) the ATF immediately notifies MP-BGP so that it has current routing information and therefore can react to the change. If a next hop for a given set of VPNv4 routes changes, the router can react immediately rather than waiting for the periodic "scanner" process as previously described.Although backbone convergence clearly was important, Globenet noted that it needed to define convergence characteristics of its Layer 3 MPLS VPN service from its customers' point of view. It defined this as site-to-site convergence. To this end, it wanted to make sure that convergence times (in terms of route distribution from customer site to remote customer site(s)) were as close as possible to those obtained by the customer when Globenet was using its previous Layer 2 overlay technology. To achieve this goal, Globenet realized that a certain amount of protocol tuning would be necessary, primarily within MP-BGP.To understand each of the components of the overall site-to-site convergence time, Globenet analyzed in the laboratory how a routing update was sent from one site to another (including new and withdrawn routes). Using this information, Globenet defined eight individual convergence points in the total end-to-end convergence time. These are illustrated in Figure 5-28, as highlighted by T1 through T8.
Figure 5-28. MPLS VPN Service Convergence Points
[View full size image]

Using these convergence points, Globenet determined that the default theoretical convergence times, depending on the PE-CE protocols, were as detailed in Table 5-3.
Static | BGP-4 | OSPF | EIGRP | RIPv2 |
---|---|---|---|---|
25 seconds | 85 seconds | 35 seconds | 25 seconds | 85 seconds |
Tuning the BGP Protocol
The main delay in route convergence with the BGP protocol is the time taken to advertise a new or deleted VPN route. This time is primarily driven by the advertisement interval timer. This is set by default to 5 seconds for internal BGP (convergence point T4) and 30 seconds for external BGP (convergence points T1 and T7).Globenet chose to reduce the internal BGP timer to 1 second and the external BGP timer to 5 seconds. These new timer values allow routes to be distributed across the backbone network more quickly. They also provide a small delay for the advertisement of these routes to external peers to allow a certain amount of packing of routes into the updates.Using these new timer values, Globenet was able to drop the theoretical maximum convergence time (when BGP-4 is used on the PE-CE links) to 27 seconds. (This is the default theoretical maximum of 85 seconds minus twice a 4-second saving for internal BGP and twice a 25-second saving for external BGP.) This time is more inline with the other routing protocols. Globenet monitors the routers' available resources on a regular basis to make sure that these new timer values do not negatively affect its routers' scalability.
Edge Router Capabilities
Clearly the edge router capabilities are a major factor in scaling the Layer 3 MPLS VPN service. The main components that drive the edge router's scalability are processing power and available memory space.When reviewing the CPU and memory characteristics, Globenet noted several points that affected the overall scale. In general, it found that CPU usage is driven by a number of factors, including (but not limited to) the following:Quality of Service Design" section, Globenet minimized the QoS features to be supported on the PE routers in the following ways:In the case of managed CE routers, the classification, marking, and policing functions are currently entirely performed by the CE router's completely offloading this task from the PE router. This greatly facilitates scaling. It is much easier for each CE router to support the QoS functions relevant to a single site than for a PE router to support the QoS function for all the sites attached to it.Globenet chose a DSCP/EXP value scheme such that EXP values get automatically marked correctly based on the Differentiated Services Codepoint (DSCP) values via the IOS default mapping so that no DSCP-to-EXP mapping needs to be performed by the PE router.In the case of unmanaged CE routers, a relatively light input policy is applied because traffic classification doesn't involve any fancy packet inspection and is based only on the DSCP field.In a nutshell, Globenet ensured that the QoS features on the input side have no, or minimal, impact on the PE router performance. Hence, the QoS features to be considered for PE router performance impact are primarily the ones related to scheduling on the egress (when forwarding packets onto the PE-CE link). These features really cannot be avoided, because they are essential to the enforcement of Globenet's five classes of service over this congestion point. They also have been designed as light as possible because they also only rely on DSCP-based classification."Managed" or "unmanaged" service For customers that use the "managed" service (they have managed CE routers), the CPU requirement may be distributed between the PE router and the attached CE routers, as just discussed for the specific aspect of QoS. This helps scale the overall system to a greater extent. This is not possible for unmanaged CEs.PE-CE protocol connectivity type Each routing protocol has different requirements and therefore needs more or fewer processing cycles. For example, OSPF requires the processing of initial LSA generation but may remain relatively quiet assuming no routing changes, except for database refresh every 30 minutes. In contrast, RIP sends periodic updates at regular intervals.Number of attached VPNs and associated VRFs Clearly the number of VRFs at the PE router drives CPU requirements because the amount of routing information to be processed and stored increases with the number of VRFs.
From a memory perspective, Globenet noted that the amount of memory required at the PE router increases based on the following:Number of VRFs/routes Each VRF and every route within the VRF requires memory. As the number of VRFs and routes increases, so does the required memory. Note that a VPNv4 route requires more memory than an IPv4 route.Internet routes at the PE routers Because Globenet stores Internet routes at some of its PE routers, the amount of memory available to store VPN routes is clearly reduced. This may be quite significant on some of its older router platforms that have limited memory.OSPF/RIPv2 connectivity OSPF and RIPv2 both rely on databases to store their routing information. This memory is in addition to that used to store the routes in the VRF and forwarding tables. Globenet therefore restricts the number of these types of customers that may be deployed on any given PE router.BGP paths The number of remote sites for a given VPN increases the number of routes and paths received from remote locations at the local PE router. Given this, Globenet chose to use the maximum routes command in each VRF to restrict the number of routes and to be able to engineer the routers appropriately.
Globenet has several different edge router platforms, each of which had its maximum scale characterized during lab verification testing in terms of the maximum number of services it can support (assuming typical service distribution and associated features). The parameters used to determine the maximum scale were based on the typical split of Internet/Layer 3 VPN customer attachments, typical access speeds, typical CoS portfolio, maximum continuous CPU load of 50 percent, and so on. Based on this, Globenet defined provisioning rules for each platform it specified as 70 percent of the maximum scale tested.