Fast Convergence and the Fast Reroute Network reliability is an important design aspect for deployability of time and loss sensitive applications. When a link, node or SRLG failure occurs in a routed network, there is inevitably a period of disruption to the delivery of traffic until the network reconverges on the new topology. Fast reaction is essential for the failed element for some applications.
There are two approaches for the fast reaction in case of failure: Fast convergence and fast reroute. Although people use these terms interchangeably, they are not the same thing. In this post I will explain the definitions and high level design considerations for fast convergence and the fast reroute.
Fast Reroute mechanisms in IP and MPLS , design considerations and pros and cons of each one of them will be explained in a separate post. When a local failure occur four steps are necessary for the convergence. These steps are completed before traffic continues on the backup/alternate link.
1. Failure detection (Protocol Hello Timers , Carrier Delay and Debounce Timers, BFD and so on)
2. Failure propagation (LSA and LSP Throttling timers)
3. New information process (Backup/Alternate path calculation) (SPF Wait and Run times)
4. Update new route into RIB/FIB (After this step, traffic can continue to flow through backup link) For fast convergence, these steps are tuned. Tuning the timers mean generally lowering them as most vendors use higher timers to be on the safe side. Because as you will see later in this post, lowering these timers can create stability issue in the network.
When you tune the timers for failure detection, propagation and the new path calculation, it is called fast convergence. Because traffic can continue towards alternate link faster than regular convergence since you use lower timers. (Instead of 30seconds hello timer, you can use 1 second hello , or instead of 5 seconds SPF wait time, you can make it 10 ms and so on.) Although the RIB/FIB update is hardware dependent, the network operator can configure all the other steps. One thing always needs to be kept in mind; Fast convergence and fast reroute can affect network stability. If you configure the timers very low, you might see false-positives. Unlike fast convergence, for the fast reroute, backup path is pre-computed and pre-programmed into the router RIB/FIB.
This increases the memory utilization on the devices. There are many Fast Reroute mechanisms available today. Most known ones are; Loop Free Alternate (LFA), Remote Loop Free Alternate (rLFA), MPLS Traffic Engineering Fast Reroute and Segment Routing Fast Reroute. Loop Free Alternate and the Remote Loop Free Alternate if also known as IP or IGP Fast Reroute Mechanisms. Main difference between MPLS Traffic Engineering Fast Reroute and the IP Fast Reroute mechanisms are the coverage. MPLS TE FRR can protect the any traffic in any topology. IP FRR mechanisms need the physical topology of the networks to be highly connected. Ring and square topologies are hard for the IP FRR topologies but not a problem for MPLS TE FRR at all. In other words, finding a backup path is not always possible with IP FRR mechanisms if the physical topology is ring or square. Best physical topologies from this aspect is full mesh. If MPLS is not enabled on the network, adding MPLS and RSVP-TE for just MPLS TE FRR functionality is considered as complicated. In that case network designers may want to evaluate their existing physical structure and try to alternate/backup path by adding or removing some circuit in the network.
IGP metric tuning also helps router to find alternate loop free paths. Fast reroute mechanisms can be considered as subset of fast convergence. But as you can understand from this post, all the above steps are taken after the failure in fast convergence and all of them are already ready in fast reroute. In fast reroute, traffic can flow through the backup path as soon as failure is detected. For the fast failure detection, best thing is to rely on physical detection mechanisms such as carrier delay , debounce timers, Automatic Protection Switching and so on.
Sometimes it is impossible to use these mechanisms (If there is Layer 1 or 2 device between the routers for example) then the best mechanism for the fast failure detection is BFD. Last but not least, convergence time is always faster with fast reroute mechanisms (50 ms is not a magic with them) compare to fast convergence (Generally less than a second but after that stability is a problem) mechanisms.