Orhan Ergun No Comments

BFD is not a fast convergence mechanism. BFD stands for Bidirectional Forwarding Detection. It is an important tool for the IP layer but there is a confusion in the network community about it.

 

BFD is a failure detection mechanism. Link and  node failures can be detected with it.

 

Without BFD, detection can be done at Layer 0, physical layer or Layer 2. Switches and routers can detect the failure with carrier delay and debounce timers.

 

Detection can be done at Layer 3 as well of course. In fact, in many networks, many protocols at multiple OSI layers is used to detect the link/node failures as of 2017. I will discuss the problems with it in a separate post.

 

Why failure detection is done at layer 3?

 

If there are two layer 3 devices connected to switch and if you setup a routing protocol neighborship between them, in case of failure, routing protocol neighborship stays up until hold time is expired.

 

In that case,reducing routing protocol hello and hold timers were the common practice for fast failure detection.

 

But this approach is resource intensive as the routing protocol hello packets are processed by the CPU.

 

BFD offloads the liveliness detection task from CPU to linecard/dataplane.

 

So instead of slow control plane, much faster data plane is used by the BFD for fast failure detection.

 

BFD packets compare to routing protocol hello packets are lighter in size.

 

But probably one of the most important features of BFD is, it is used to notify many other overlay protocols about the failure.

 

Wait, let me explain, I don’t want to use complicated words, let me make it easy.

 

If you have OSPF , BGP and LDP in your network (Classical MPLS VPN network protocols), all these protocols have hello and hold/dead timers.

 

If you want to improve the convergence time by detecting the failure faster, you tune the timers of these protocols, you basically reduces the hello and hold/dead timers for all these protocols individually.

 

Which mean, you create more resource utilization problem.

 

But when BFD is enabled, you don’t need to tune the timers of OSPF, BGP and LDP individually.

 

You just use very aggressive timers on BFD. And default timers of the other protocols. That’s why you reduces the overall control plane load.

 

OSPF, BGP and LDP register to the BFD as client and in case of failure, it detects the failure very fast and notify the client protocols about the failure.

 

But fast convergence is different thing than BFD.

 

Multiple steps are required to provide fast convergence for your traffic when the failure happens.

 

Fast Convergence Steps

 

  • Fast Failure Detection
  • Failure information propagation
  • Processing and finding an alternate path
  • Installing new path in routing and forwarding table

 

As you can see, multiple steps are required to be able to use alternate/backup paths.

 

Fast failure detection is an important but not the only step. Installing new path in routing and forwarding table is time consuming task as well, especially if there are so many prefixes. (I explained BGP Prefix Independent Convergence, you may want to check)

 

Thus, if you say BFD is a fast converge mechanism, it would be wrong.

 

It is just one of the steps in fast convergence. !

 

 

 

Leave a Reply

Your email address will not be published.