Orhan Ergun 22 Comments

BGP route reflectors, used as an alternate method to full mesh IBGP, help in scaling.

BGP route reflector clustering is used to provide redundancy in a BGP RR design. BGP Route reflectors and its clients create a cluster.

In IBGP topologies, every BGP speaker has to be in a logical full mesh. However, route reflector is an exception.

IBGP router sets up BGP neighborship with only the route reflectors.

In this article, I will specifically mention the route reflector clusters and its design.

Some Terminology first :

Route Reflector Cluster ID has four-byte BGP attribute, and, by default, it uses a BGP router ID.

If two routers share the same BGP cluster ID, they belong to the same cluster.

 

Before reflecting a route, route reflectors append its cluster ID to the cluster list. If the route is originated from the route reflector itself, then route reflector does not create a cluster list.

 

If the route is sent to EBGP peer, RR removes the cluster list information.

If the route is received from EBGP peer, RR does not create a cluster list attribute.

Cluster list is used for loop prevention by only the route reflectors. Route reflector clients do not use cluster list attribute, so they do not know to which cluster they belong.

 

If RR receives the routes with the same cluster ID, it is discarded.

Let’s start with the basic topology.

BGP Route Reflector Cluster Same CLuster ID

Figure-1  Route Reflector uses same cluster id

 

In the diagram shown above in fig.1, R1 and R2 are the route reflectors, and R3 and R4 are the RR clients. Both route reflectors use the same cluster ID.

Green lines depict physical connections. Red lines show IBGP connections.

Assume that we use both route reflectors as cluster ID 1.1.1.1 which is R1’s router ID.

R1 and R2 receive routes from R4.

R1 and R2 receive routes from R3.

Both R1 and R2 as route reflectors appends 1.1.1.1 as cluster ID attributes that they send to each other. However, since they use same cluster, they discard the routes of each other.

That’s why, if RRs use the same cluster ID, RR clients have to connect to both RRs.

In this topology, routes behind R4 is learned only from the R1-R4 direct IBGP session by the R1 (R1 rejects from R2). Of course, IGP path goes through R1-R2-R4, since there is no physical path between R1-R4.

If the physical link between R2 and R4 goes down, both IBGP sessions between R1-R4 and R2-R4 goes down as well. Thus, the networks behind R4 cannot be learned.

Since, the routes cannot be learned from R2 (the same cluster ID), if physical link is up and IBGP session goes down between R1 and R4, networks behind R4 will not be reachable either, but if you have BGP neighborship between loopbacks and physical topology is redundant , the chance of IBGP session going down is very hard.

Note : Having redundant physical links in a network design is a common best practice. Thats why below topology is a more realistic one.

 

What if we add a physical link between R1-R4 and R2-R3 ?

BGP Route Reflector Clusters Same Cluster-ID with excessive redundancy

 Figure-2 Route Reflector uses same cluster-ID, physical cross-connection is added between the RR and RR clients

 

In Figure-2  physical cross-connections are added between R1-R4 and R2-R3.

Still, we are using the same BGP cluster ID on the route reflectors.

Thus, when R2 reflects R4 routes to R1, R1 will discard those routes. In addition, R1 will learn R4 routes through direct IBGP peering with R4. In this case, IGP path will change to R1-R4 rather than to R1-R2-R4.

In a situation in which R1-R4 physical link fails, IBGP session will not go down if the IGP converges to R1-R2-R4 path quicker than BGP session timeout (By default it does).

Thus, having the same cluster ID on the RRs saves a lot of memory and CPU resource on the route reflectors even though link failures do not cause IBGP session drop if there is enough redundancy in the network.

If we would use different BGP cluster ID on R1 and R2, R1 would accept reflected routes from R2 in addition to routes from direct peering with R4.

Orhan Ergun recommends Same BGP Cluster ID for the Route Reflector redundancy.

Route reflectors would keep an extra copy for each prefix.

Let me throw a series of questions to you.

Do you have route reflector in your network?

Do you have more than one for redundancy?

Are you using identical or different cluster ID?

Let’s discuss your network design in the comment section.

 
0.00 avg. rating (0% score) - 0 votes
  • Tom

    We have a MAN consisting of eight routers runnning iBGP and using two route reflectors within the same cluster. This MAN is connected to two other autonomous systems. Although there are many improvements possible in the rather bad design (I did not design in) I did not yet look into the decision of whether to use a single cluster with two route reflectors or two clusters with one RR each. There is physical redundancy in place however, so maybe there is no real need for two clusters.

    • @Tom, If there is physical redundancy for the BGP next hop having one cluster is sufficient as I explained in the post as well.

      1) Two questions to you, Why for the 8 routers you use Route reflectors ?
      2) What do you mean improvements for the RR ( advertisement multiple path ?, separating services ? ) ?

  • Tom

    We have a MAN consisting of eight routers runnning iBGP and using two route reflectors within the same cluster. This MAN is connected to two other autonomous systems. Although there are many improvements possible in the rather bad design (I did not design in) I did not yet look into the decision of whether to use a single cluster with two route reflectors or two clusters with one RR each. There is physical redundancy in place however, so maybe there is no real need for two clusters.

  • Haroon

    Juicy article Cluster list save us from loop while having same cluster id on RR, problem will get increase and what i understand same cluster id would be bad design and resources intensive.

    • @Haroon Same cluster ID is not a bad design, in contrast it is a better design. Since you will not need extra memory and CPU to handle the prefixes which would come from the other RR, it is nice.
      On the other hand, If it is IP only network, IBGP topology should follow the physical topology of the network. Otherwise, if you are lucky you have suboptimal routing. Worse, it creates persistent routing loop.
      In our topology it is not a problem , do you see why ?

      • Michael Kashin

        @Orphan, why did you say IP only network in regards to loops and suboptimal routing. Wouldn’t it be the case for any destination-based forwarding network?
        BTW, it’d be interesting to see a post highlighting bad RR designs and the specific errors they may lead to.
        Cheers

  • Pingback: BGP RR Design - Part 1 - Packet Pushers Podcast()

  • Roy Lexmond

    Hi Orhan,

    Nice post !! you cannot have a persistent loop because you follow the physical topology.
    With MPLS RR you don’t have this problem the packets toward the BGP next hops always carry an LDP-generated label for the BGP next hop.

    Cheers,
    Roy

  • djahem

    “If the routes received with the same cluster ID by the RR, it is discarded” so why do we need an ibgp session between R1 and R2 if they each discard routes learned from eachother, in this specific topology with 4 routers only

    • Hi Djahem,

      You need an IBGP session between R1 and R2 to send their client BGP routes.You can either use same or different cluster IDs. Use same cluster ID for the same tier route reflectors as per my suggestion.

      Cheers,
      Orhan

      • Niko

        Hi Orhan,

        Continuing djahem question…
        From figure2, R1 has IBGP session with R3 and R4. R2 also has IBGP session with R3 and R4. Why still we need IBGP between R1 and R2?

        Thanks
        Niko

        • @Niko, thanks for the comment. If the physical connections between RR and RR Clients fail , then how you would send the traffic between the RR Clients ? That’s why you need that. Unless you redistribute BGP into IGP, which you don’t do except specific applications.

          Cheers,
          Orhan

          • Niko

            Hi Orhan,

            Thanks for your response. I hope you don’t mind if I want to discuss more about this. In figure 2, enable IBGP session between RR1 and RR2. If R1 lose connections with R3 and R4, R1 will not accept any bgp routes from R2 since they have same Cluster ID.

            Thanks,
            Niko

          • @Niko That’s correct. I don’t mind at all 🙂

  • Uday

    Hi Orphan,

    Having redundant RR is good but here it’s not giving us the failover functionality. Like you mentioned, let’s say if the link between R2 – R4 is up but IBGP between R1-R4 is not forming, then R1 will not learn the route from R2 because of loop prevention.

    Are there any way to fix this by giving failover?

    Thanks,
    Uday

    • Thanks Uday for the comment, if there is no IBGP session between R1-R4 in the same cluster ID case , then there is no fix for failover, you need to use a different cluster ID.
      By the way it is Orhan 🙂

      • Alex R

        Hello Orhan,

        Nice post. But isn’t the point of having multiple RRs,is to have redundancy ? What good does it do me if the iBGP session between R1-R4 fails and networks behind R4 are still unreachable because R1 will discard routes from R2? I might as well just have one RR.

  • Pingback: DMVPN vs. GETVPN | Cisco Network Design and Architecture | CCDE Bootcamp | orhanergun.net()

  • Pingback: BGP Best External | Cisco Network Design and Architecture | CCDE Bootcamp | orhanergun.net()

  • Sayed Nada

    Orhan,
    How would you overcome double failure between the RR and the RR clients if you use the same cluster-id? So, If R4 loses the iBGP to R1 and R3 loses the IBGP to R2, How will R3 get the routes that R4 advertises.

  • dbsg7777

    If using the same CID, then there is no point in peering between the RR’s, correct?