When it comes to fast convergence, first thing that we need to understand what is convergence?
Convergence is the time between failure and the recovery. Link, circuits, routers, switches all eventually fails. As a network designers, our job is to understand the topology and whenever there is qrequirement, add backup link or node. Of course, not every network, or not every place in the network requires redundancy though. But let’s assume, we want redundancy, thus we add backup link or node and we want to recover from the failure as quickly as possible, by hoping before Application timeout.
But what is the time for us to say , this network is converging fast. Unfortunately, there is no numerical value for it. So, you cannot say, 30 seconds , or 10 seconds , or 1 second is fast convergence. Your application convergence requirement might be much below 1 second.
Thus, I generally call ‘ Fast Convergence’ is the convergence time faster than default convergence value. Let’s say, OSPF on Broadcast media is converging in 50 seconds, so any attempt to make OSPF convergence faster than 50 seconds default convergence value is OSPF Fast Convergence on Broadcast media.
There are in general 4 steps for making the convergence faster, so 4 steps for Fast Convergence.
Four necessary steps in fast convergence
1. Failure detection
Layer 1 Failure detection mechanisms:
- Carrier delay
- Debounce Timer
- Sonet/SDH APS timers
- Layer 3 Failure detection mechanisms:
- Protocol timers (Hello/Dead)
BFD (Bidirectional Forwarding Detection)
For the failure detection, best practice is always use Physical down detection mechanism first. Even BFD cannot detect the failure faster than physical failure detection mechanism.
Because BFD messages is pull based detection mechanism which is
sent and receive periodically, but physical layer detection mechanism is
event driven and always faster than BFD and Protocol hellos.
If physical layer detection mechanisms cannot be used (Maybe because there is a transport element in the path), then instead of tuning protocol hello timers aggressively, BFD should be used. Common example to this is if there are two routers and connected through an Ethernet switch, best method is to use BFD.
Compare to protocol hello timers, BFD is much ligher in size, thus consumes less resource and bandwidth.
2. Failure propagation
Propagation of failure throughout the network.
Here LSP throttling timers come into play. You can tune LSA
throttling for faster information propagation. It can be used to slow down the information processing as well. Also LSP pacing timers can be tuned for sending update much faster.
3. New information process
Processing of newly arrived LSP to find the next best path. SPF
throttling timers can be tuned for faster information process for fast convergence.
4. Update new route into RIB/FIB
For fast convergence, these steps may need to be tuned. Although
the RIB/FIB update is hardware dependent, the network operator can configure all other steps. One thing always needs to be kept in mind; Fast convergence and fast reroute can affect network stability.
In both OSPF and IS-IS Exponential backoff mechanism is used to protect the routing domain from the rapid flapping events. It slows down the convergence by penalizing the unstable prefixes. Very similar mechanism to IP and BGP dampening.