BGP PIC - Prefix Independent Convergence Fundemantals

BGP PIC Fundamentals: BGP PIC ( Prefix Independent Convergence ) is a BGP Fast reroute mechanism which can provide sub-second convergence even for the 900K internet prefixes by taking the help of IGP convergence.

BGP PIC uses a hierarchical data plane in contrast to a flat FIB (Forwarding table) design which is used by Cisco CEF and many legacy platforms.

In a hierarchical data plane, the FIB used by the packet processing engine reflects recursions between the routes.

I will explain the recursion concept throughout the post so don't worry about the above sentence, it will make sense.

There are two implementations of the BGP pic concept and they can protect the network traffic from multiple failures.

Link, a node in the core or edge of the network can be recovered under a second, and in most cases under 100ms ( It mostly depends on IGP convergence, so IGP should be tuned or IGP FRR can be used ).

In this article, I will not explain IGP fast convergence or IGP Fast reroute but you can read my Fast reroute mechanism article from here.

BGP PIC can be thought of as a BGP Fast Reroute Mechanism that relies on IGP convergence for failure detection. ( All overlay protocols rely on underlay protocol convergence ie LDP/IGP Synchronization, STP/HSRP, IGP/BGP, IP/GRE, and so on.. )

As I mentioned above there are two implementations of BGP PIC namely, BGP pic edge and BGP pic core. Let's start with BGP PIC Core.

BGP PIC CORE

BGP PIC CORE

In the above figure R1, R2, R3, R4, and R5 belong to AS 100, and R6 and R7 belong to AS 200.

There are two EBGP connections between ASBRs of the Service Providers.

Everybody told you so far that BGP is slow because BGP is used for scalability in the networks, not for fast convergence, right?

But that is wrong too. Or at least not enough to understand how BGP converges!

If BGP relies on the control plane to converge of course it will be slow since the default timers are long ( BGP MRAI, BGP Scanner, and so on, although you don't need to rely on them as I will explain now ), prefixes, and path information are too much for Best path selection algorithm to select the second-best path to advertise in case primary path fails.

Default-free zone already has more than 900K prefixes. So approximately we are talking about 100 MB of data from each neighbor, it takes time too. If you have multiple paths, the amount of data that needs to be sent will be much higher.

Let's look at BGP control plane convergence closer...

Imagine that R1 in the above picture learns the 5.5.5.5 prefix from R4 only. R4 is the next hop. (You choose maybe R4 as a primary link with BGP Local preference, MED or you don't do anything but R4 is selected by the hot potato routing because of Route reflector position)

If R4 is the best path, R5 doesn't send the 5.5.5.5 prefix to the IBGP domain unless BGP best external is enabled (I highly recommend you to enable it if you want additional path information in the Active/Standby link ).

How did IBGP routers learn that R4 failed?

There is two mechanisms for that. They will either wait for the BGP Scanner time ( 60 seconds in most implementation ) to check if the BGP next hop for the BGP prefixes are still up or the newer approach BGP Next Hop tracking ( Almost all vendors support it ). With BGP next-hop tracking, BGP next hop prefixes are registered to the IGP route watch process, so as soon as IGP detects the BGP next-hop failure, BGP is informed.

It is similar to BGP, IGP, and LDP registration to the BFD right ? .. Good!

So R1 learned the R4 failure through IGP. Then R1 has to go and delete all the BGP prefixes which are learned from that next hop. If it is a full internet routing table, it is a very time-consuming process as you can imagine. I am talking here for minutes.

In the absence of an already calculated backup path, BGP will rely on this control plane convergence so, of course, it will take time. But you don't have to rely on that. I recommended many service providers start to consider BGP PIC, and Egress FRR for their Internet and VPN services.

In the routers routing table, there is always a recursion for the BGP prefixes. So for the 5.5.5.5 prefix, the next hop would be 10.0.0.1 if the next-hop-self is enabled.

But in order to forward the traffic router need to resolve immediate next hop and layer 2 encapsulation if it is an Ethernet Mac address.

For the BGP next-hop 10.0.0.1 R1 selects either 172.16.0.1 or 172.16.1.1 as an IGP next hop. Or R1 can do the ECMP ( Equal Cost Multipath ) and thus can use both 172.16.0.1 and 172.16.1.1 to reach 10.0.0.1.

In the many vendor FIB implementation, BGP prefixes resolve immediate IGP next hop. Cisco's CEF implementation works in this way too. This is not necessarily a bad thing though.

It provides better throughput since the router doesn't have to do a double/aggregate lookup. But from the fast convergence point of view, we need a hierarchical data plane ( Hierarchical FIB ).

With the BGP PIC, both PIC Core and PIC Edge solutions, you will have a hierarchical data plane so for the 5.5.5.5 you will have 10.0.0.1 or 10.0.0.2 as the next hop in the FIB ( Same as RIB ).

For the 10.0.0.1 and 10.0.0.2, you will have another FIB entry that points to the IGP next-hops which are 172.16.0.1 and 172.16.1.1. These IGP next-hops can be used as load shared or active/standby manner.

BGP PIC Core helps to hide IGP failure from the BGP process. If the links between R1-R2 or, R2-R3 fail, or R2, R3 fails, R1 will start to use backup IGP next-hop immediately. Since the BGP next-hop didn't change and only the IGP path changed, recovery time will be based on IGP convergence.

For the BGP PIC Core, you don't have to have multiple IBGP next hop. BGP PIC Core can handle core IGP link and node failure.

BGP PIC EDGE

Let me explain BGP PIC Edge which can handle edge link or node failure in a slightly different than BGP PIC Core for some scenarios.

In order for BGP PIC Edge to work, edge IBGP devices (Ingress PEs and ASBRs) need to support BGP PIC and also they need to receive backup BGP next hop.

Unfortunately backup next hop is not sent in IBGP Route-Reflector topologies. One of the drawbacks of Route reflector is when it needs to do hot potato by calculating IGP cost to the BGP next-hop, it takes only its cost to the next hop into consideration. Route reflector to BGP next-hops IGP cost calculation might be different from Ingress PE to BGP next-hops cost calculation.

Thus Route reflector may not provide an optimal path for all the Ingress PEs. BGP Optimal route reflection draft specifies a couple of solutions which I covered in my early article here.

How would you send more than one best path from the Route reflector to the Route reflector clients?

There are many ways to do it but two famous ones, are BGP Add-path and BGP Diverse paths ( Multiple Control plane RRs). I will explain these ideas in a separate article.

Assume now we have more than one path on the R1.

We should cover two edge failure scenarios to show how BGP PIC Edge helps in different cases.

In the first case: we are doing BGP next-hop-self on R4 and R4 fails.

This failure information is detected by IGP and Next-hop tracking removes BGP next hop from the BGP path list on R1.

An alternate backup route can be immediately used. This is BGP data plane convergence, not a control plane so convergence time is only related to IGP convergence and prefix independent. If you have a 500K full internet routing table, all of them will be installed in the FIB before the failure as a backup route and when the failure happens, the next BGP next hop is used immediately.

BGP PIC is not necessarily the only BGP feature. Since BGP can take advantage of recursion, hierarchical data plane arrangement. It is also not a Cisco proprietary protocol, most of the vendors implement BGP PIC today.

The second failure scenario might be an edge link between R4 and R6. R4 is our primary next-hop and we are doing next-hop-self on the R4 ( In MPLS VPN, you always do that ! )

If the edge link fails, since BGP's next hop doesn't change on the R1, R1 continues to forward the traffic according to IBGP's best path selection sent by the RR to the R4.

In this case, R4 should redirect to packet to its alternate second best path which is R5. But in an IP environment without tunneling, intermediate nodes which are not converged yet would send the packet back to R4 since they would think that R4 is still reachable so it would be a temporary loop. In the case of MPLS or other tunneling mechanisms, intermediate nodes wouldn't need BGP so they would just send packets to the second-best path as per the R4 request.

Created by
Orhan Ergun

Orhan Ergun, CCIE/CCDE Trainer, Author of Many Networking Books, Network Design Advisor, and Cisco Champion 2019/2020/2021

He created OrhanErgun.Net 10 years ago and has been serving the IT industry with his renowned and awarded training.

Wrote many books, mostly on Network Design, joined many IETF RFCs, gave Public talks at many Forums, and mentored thousands of his students.  

Today, with his carefully selected instructors, OrhanErgun.Net is providing IT courses to tens of thousands of IT engineers. 

View profile

Daniel Lardeux
Daniel Lardeux Senior Network Consultant at Post Telecom

I passed the CCDE Practical exam and Orhan’s CCDE course was very important contributor to my success. I attended the CCDE course of Orhan Ergun in July and it was exactly what I needed, Orhan is taking the pain to break down the different technologies.

Roy Lexmond
Roy Lexmond Senior Network Designer at Routz CCDE #20150017 & CCIE R&S; #26557

After I attended Orhan Ergun’s CCDE course I passed the CCDE practical exam.I really enjoyed the course a lot ...

Nicholas Russo
Nicholas Russo Network Consulting Engineer (CCDE/CCIEx2), Cisc

I signed up for Orhan’s CCDE training. This training is very technically detailed and the use-cases, quizzes, scenarios, and mind maps are all great resources in the overall training program. Orhan teaches his students to think like a network designer ...

Slide Heading
Slide Heading Network Systems Engineer at Conscia A/S CCIE #42544 (SP) & CCDE #20160015

Orhan is forcing you to take off the implementation hat that most of us have been wearing for many years, instead he is providing a new fancy design hat, which makes you see and deal with the issues presented ...

Kim Pedersen
Kim Pedersen CCIE in RS and SP (#29189) CCDE#20170021

I’ve used Orhan’s self-paced CCDE training material. If you are interested in knowing how all the technologies go together in a coherent design i can highly recommend it.I also enjoyed the Quizzes which helped pick out my weak spots in selecting ...

Laurent Metzger
Laurent Metzger 3xCCIE/CCDE Senior Network Architect

Hi Orhan. I passed the CCDE exam on February 22. I read everything that you put on your Self Paced CCDE Training course and it was very helpful in my success. Thank you very much.

Martin J. Duggan
Martin J. Duggan Network Architect at AT&T;, Ciscopress Author CCDE #20160006 & CCIE#7942

I attended Orhan’s April 201610 days CCDE Bootcamp. I am CCDE now !

You can tell Orhan has a great deal of experience, it really comes through when he presents his design case studies and the CCDE Practical scenarios.

Muhammad Abubakar
Muhammad Abubakar Lead Network Architect – CCDE #20160016 2xCCIE #26693 2xJNCIE VCIX

Your excellent CCDE materials and amazing Bootcamp helped me tremendously through my learning journey.Also thank you very much for being available whenever I have a design question or a complex design topics. I can’t compare your design skills ...

Jennifer Pai
Jennifer Pai Network/Security Engineer at KNET Technology

Thanks Orhan very much for this course. It helped strengthen my “Network design mind”.

Ruslan Silyayev
Ruslan Silyayev Solution Architect at R.I.S.K Company

Training by Orhan is not a CCDE preperation training only. It will be useful for engineers which are dealing with design. You want to pass CCDE exam or learn network design, then don’t look at anywhere else!

Sameer Meher
Sameer Meher Solutions Architect at 23 Wards/Japan

Orhan Ergun’s CCDE course was really very good. CCDE Level Intelligence was delivered very well and with very useful case studies and the scenarios, I am thankful to Orhan for all his help!

Ken Young
Ken Young Senior Technical Architect Province of Nova Scotia, 2xCCIE #41597 | CCDE #20170047

If anyone wants to understand network design and architecture, also pass CCDE exam , I recommend you to attend Orhan’s online courses! I am a CCDE now but learning is a journey, we will be together in your other courses too Orhan!

Matt Cross
Matt Cross Technical Architect at Heartland – CCDE #2019::7

Orhan did an excellent job of filling in the gaps of knowledge that I had that took me to the finish line of the practical exam CCDE. The community of people that Orhan facilitates are both engaging and supportive of the journey to CCDE. Orhan ...

Shiling Ding
Shiling Ding Sentinel Technologies – CCDE #2019::12

Just passed the CCDE Practical exam! I attended Orhan Ergun’s CCDE training program , used Orhan’s Instructor Led and Self-Paced CCDE training and Online CCDE Practical Scenarios during my CCDE journey. Orhan’s CCDE In Depth book is an excellent summary ...

Abelardo Basurto
Abelardo Basurto Solutions Architect at Cisco Systems – CCDE 2018::6

Hi everyone, I’ve just passed the CCDE Exam. My Number is CCDE 2018::6 I attended to Online CCDE Bootcamp of Orhan. I want to thank Orhan not only for the great book and bootcamp, but also for his commitment, availability and willingness to assist the ...

Hady Mohamed Abdellah
Hady Mohamed Abdellah Network Architect Hamad International Airport Qatar – CCDE 2018::1

Hi guys, I’m so happy that I passed the exam. I’ve already got my number CCDE 2018::1. Thanks to Orhan for being the best CCDE instructor in the world. I highly recomend Orhan’s CCDE Training and In-Depth-CCDE ...

Bryan Bartik
Bryan Bartik Sr. Systems Engineer at CompuNet – CCDE 20170059

Hi Orhan I passed CCDE Practical exam on November 2017 ! I really enjoyed your materials and quizzes and use cases. They were definitely helpful in my preparation. Thanks a lot !

Giedrius Trapkauskas
Giedrius Trapkauskas Network Solutions Architect at Liberty Global – CCDE 20180004

I attended Orhan’s CCDE Training in Istanbul and it was very helpful in my preparation. I passed the exam recently and I want to say Thank you Orhan! For those who want to pass the CCDE exam, definitely start with ...

Alaa Issa
Alaa Issa Sr.Solutions Architect – CCDE#20180033 3xCCIE ( Collab|DC|Security )#27146

I registered to Orhan’s training in Feb 2017. From that time, I attended Orhan’s training several times. The depth of knowledge which Orhan has is amazing, and how to present such consistent knowledge to the ...

Mazin Ahsan Design Lead Engineer | Solutions Engineer | CCDE License # 20160030 | CCIE Licence # 23892

I passed the CCDE Practical Lab exam on November 17,2016 from supplications of elders and dedication from my Sensei Mr. Orhan Ergun I took different CCDE bootcamps in the past. Orhan has the most depth and expertise ...

Jeff Patterson CCDE# 2018::11

Hi Orhan I wanted to pass along my appreciation for the outstanding training material. I used the online CCDE training provided by Orhan as well as the In-Depth-CCDE book and passed the exam in February 2018. Thank you Orhan!

Mehdi Sfar
Mehdi Sfar Network and Security Architect / CCDE #20210003 | CCIE R&S; #51583

I signed up for Orhan’s CCDE Self paced Course. This course, along with the CCDE In Depth book, helped me for my CCDE Practical as well as Written exams. It pushed me to ask the "WHY" questions and allowed ...

Related courses

BGP Training

22:46:48 Hours
22 Lectures
Intermediate

$150

Cisco CCIE Service Provider Training

108:43:00 Hours
258 Lectures
Expert

$1246

Routing Protocols Design and Deployment Course

47:00:55 Hours
51 Lectures
Intermediate

$200