We discussed LAG (Link Aggregation Group) and the ECMP (Equal Cost Multipath) on real network deployments with the Service Provider/Telco Engineer engineers on my slack group.
I thought it was good discussion so you can see what others are doing and the reasons of their deployments.
In this talk, three people involved. Myself , Engineer 1 and Engineer 2. Since the participants may not want to expose their networks publicly and I didn’t ask their permission, just I used Engineer 1 and Engineer 2, instead of their names. Engineer 1 started the discussion. Nice one, hope you find it informative. Enjoy..
Engineer 1
Hello guys, I hope you are all good.
I wanted to spark a little discussion about using LAGs vs ECMP in your networks. What are you using and what was the driver? Personally I use bundled interfaces.
It’s a bit easier to manage, less IP addresses is used, IGP LSDB is smaller, if member link goes out of service it’s hidden from IGP view.
Load balancing depends on platform, but I think it’s as good as in case of ECMP. Obviously with bundles we are restricted to pair of the same devices.
The downside is that it’s very hard to troubleshoot if LAG member link is malfunctioning, because traceroute always look the same. Also, if we need BFD support it may require micro-BFD which is not always available. With ECMP links we don’t have that problem. Any other gotchas?
Oh, and QoS sometimes is a pain on bundles
Engineer 2
I prefer ECMP rather than LAG anywhere possible
BFD and link issue are easier to run with individual links
And for the LAG, it may cause congestion when one link member is down
For us, the IGP cost is fixed (manually) on the link. When the link member goes down, the cost remains the same
Hence the LAG link is more prone to congestion.
Engineer 1
Well, with LAG you can rely on min-links, so if too many member links go down entire LAG will go out of service
Engineer 2
Yes. But if there are only 2 x 40G in the LAG, we would have big problem.
Orhan Ergun
Depends on the place in the network as well in the datacenter mostly LAG and MLag is used , without even BFD, failure detection is much faster due to physical link detection and if you will run routing protocol, and this is the case in any large scale DC anymore , LAG reduces the number of adjacency as you stated above but on WAN , mostly ECMP is used
Those who have large WAN networks , may not have always ECMP available , in that case neither LAG nor ECMP available but everyone use traffic engineering
MLAG might be a solution if links will be terminated on the different devices but , again , on the WAN , almost no one uses it , MLAG I see on the DCI (Datacenter Interconnect ) and in the Campus designs
Also, business types might dictate certain topologies , for example retail enterprises mostly have hub and spoke hierarchy , in that case , from the spoke site when there are two links toward Hub location, which is generally HQ or Datacenter , most people use ECMP , regardless of underlying service , be it MPLS VPN , Internet , P2P Ethernet and so on
Also, since Engineer 1 is asking your deployment model , requirements and the constraints for it , please share what you have and why you have when you have a question, you can expect the others to do that then.
Engineer 2
We used to have lots of LAGs and frankly, they work fine.
Occasionally, there is issue like the one I shared above that led to congestion
We counter by putting sufficient BW
We used LAG mainly because it is so easy to use. Just make sure the link quality is good, add it to the LAG, and it is done.
Orhan Ergun
You use it on the WAN ?
Engineer 2
Yes, @orhanergun
In SG context, WAN is around 70km
Orhan Ergun
Thats okay , but what is the topology , I mean layer 3 topology , not the optical one
Engineer 2
The core is partial mesh ; while PE has dual links to two diff P
Orhan Ergun
How LAG works between PE and two different Ps ? do you have MLAG (Multi Chassis Link Aggregation Group) ?
Engineer 2
Oh no. The LAG is between a PE and a P. Both PEs will have two LAGs to two P. And use ECMP if the destination has equal cost
Engineer 2
We run BFD for fast detection and failover
It is that simple ?
Orhan Ergun
On LAG bundles do you have micro bfd ?
Engineer 2
Yes.
Orhan Ergun
Ok I wont ask the devices
Engineer 2
Interestingly, we use two vendors and the BFD works flawlessly
Orhan Ergun
Interoperability is fine ?
Engineer 2
Yes.
Orhan Ergun
Couldn’t you do fast failure detection at layer 0, layer 1 ?
Engineer 2
In some cases, we have DWDM in between. So L1 detection isn’t enough
Orhan Ergun
Ok
you have SDH too ?
Engineer 2
No, we do not use SDH
SDH serves as our access/last mile
Orhan Ergun
Still you might need fast failure detection at your last mile for business customers
Are you using APS in that case ?
Engineer 2
Oh, I don’t have access to that part of the network
But as far as I know, for SDH, they can provide 50ms for switching/protection.
Orhan Ergun
Then they have
Engineer 2
We also have a metro-e network for access
And they use TE FRR if I am not wrong
Orhan Ergun
Are you using G.8032, REP , ERP etc ?
Engineer 2
But it is not part of my network
Orhan Ergun
Any fast failure mechanism for ME ?
You mean, you have TE FRR in the access network ?
Engineer 2
That’s what I understand