Network complexity plays a very important role in network design. Every network designer tries to find the simplest design.
In today's network designs decisions are taken based on an estimation of network complexity rather than an absolute, solid answer.
If you are designing a network, probably you heard many times a KISS (Keep it simple and stupid) principle.
We said that during a network design you should follow this principle. As you will see later in the article if you want to have a robust network you need some amount of complexity.
Today I throw a new idea that we should use as a principle for the network design.
“SUCK” is the abbreviation of “SO UNNECESSARY COMPLEXITY IS KEY”.
People refuse to have network complexity and believe that network complexity is bad. But this is wrong!
Every network needs complexity, network complexity is good!
Let me explain:
In figure-a in the above picture, a router in the middle is connected to the edge router. Obviously, it is not redundant. If we want to design a resilient network, we add a second router ( figure-b) which creates network complexity but provides resiliency through redundancy.
In order to provide resiliency, we needed complexity. But this is a necessary complexity.
There is an unnecessary complexity that we need to separate from the necessary one as I depicted above. A simple example of unnecessary complexity is adding a 3rd OSPF ABR in picture-1.
Assume that we are running a flat OSPF network as in pictures a and b, state information is kept exactly identical on every node in the domain.
Through layering, complexity can be decreased.
In the figure-c, there is an area routing, so multiple areas is created to allow summarization of reachability information. Thus state in the devices can be kept smaller so complexity might be reduced by limiting the control plane state.
But there are tradeoffs here. In order to reduce the control plane states on those devices, summarization needs to be configured on the ABRs which increases configuration and management complexity.
Although this task can be automated through management systems, someone needs to operate the management systems, so management complexity is not avoided but shifted from operators to management systems.
In this example, placing a second router and then creating multiple OSPF areas allow us to achieve many network design goals. Resiliency (through redundancy, scaling through layering/hierarchy). These are the parameters of robustness.
John Doyle who is a lead scientist in the Network complexity area states that;
Reliability is robustness to component failures.
Efficiency is robustness to resource scarcity.
Scalability is robustness to changes to the size and complexity of the system as a whole.
Modularity is robustness to structure component rearrangements
Evolvability is the robustness of lineages to changes on long time scales
A system can have a property that is robust to one set of perturbations and yet fragile for a different property and/or perturbation.
The Internet is a good example of a robust yet fragile paradigm. It is robust to single component failure but fragile for a targeted attack.
Network design follows the Robust Yet Fragile paradigm. Because RYF touches on the fact that all network designs make tradeoffs between different design goals.
In picture-1, creating multiple OSPF areas provides scalability through summarization/aggregation but it is fragile because creates a chance for a suboptimal routing.
Look at picture-2. We should be in the domain of the Robust and tried to find Pmax. Robustness definitely needs a complexity (at least some) thus NETWORK COMPLEXITY IS GOOD.
What are the elements of the networks?
Complexity is found in each sub-component of these elements.
Let me explain the network elements in detail :
The physical network contains:
Complexity is in each subcomponent of these three elements. And to understand overall network complexity, we should look at the combination of all the subcomponents.
For example, ASICs of one switch can contain 10 million logic gates but in different switch might have 100 million logic gates in ASIC on the line card.
Or one software might have 1000 features, but another software might have 10000 features. When the features increase on the software the chance of problems in the code increases due to increasing complexity.
In the above picture figure-1 configuration size on the routers in Tier-2 Service providers. The figure-2 size of code is shown on the routers.
As you can see, things tend to grow, not shrink!
Increasing the line of configuration or size of code comes with a cost of complexity.
More features in the software, more configuration on the devices by the time.
Vendors' vulnerability announcements increase every period/year due to added features.
If you think about your network, How many people know about all the config on the router from top to bottom.
Probably no one or very few right?
Security,Routing,MPLS etc all those configuration on the router is managed by the different set of people in the companies !
By the way, I should say that having a different configuration on the 10 interfaces of a router is more complex than having the same configuration on the 1000 interface on that router. This is known as modularity and repeatable configuration and deployments are good.
These protocols interaction creates complexity in your networks.
Example for this, you run OSPF or IS-IS is a link-state protocol, and for the fast reroute you might be running MPLS TE-FRR. To be able to provide it, you need to run not only OSPF or IS-IS but also RSVP and most probably LDP as well.
A friend of mine and one of the lead network designers and architects Russ White said somewhere that his friend defined complexity as ” What you don’t understand is complex “. As I understand from his talk, Russ agreed with him.
I don’t agree with this definition.
It is of course relative but, BGP is not a complex protocol for me and probably for those who read this article up to here. But policy interaction between BGP peers creates Bgp wedgies(RFC 4264) and policy violations due to data plane vs control plane mismatch.
So the complexity here comes from conflicting policy configurations used on two different Autonomous Systems although you understand many things about BGP. (Small amount of input (policy in BGP) creates a large amount of output in complex networks)
Unpredictable : In a complex network, effect of a local change would be an unpredictable on the global network.
Don’t you have a configuration on your routers or firewall which even you don’t know why they are there but you can’t touch them since you can not predict what can happen if you remove them.
Predictability is critical for security. I will explain this later in the article.
Fragility : In a complex networks, change in one piece of the network can break the entire system.
I think layering is a nice example to explain fragility. I use layering terms for the underlay and overlay networks here.
In an MPLS network, you run routing protocol to create a topology and run MPLS control and data plane for the services.
The overlay network should follow the underlay network.
The overlay is LDP and the underlay is IGP.
If failure happens in the network, due to protocol convergence timing, a black hole occurs.
In order to solve this issue, either you enable LDP session protection or LDP-IGP synchronization.
Protocol interactions are the source of complexity, it creates fragility and to make the network more robust you add a new set of features (in this example LDP-IGP Synchronization or Session protection). Added each feature increases the overall complexity.
Ideally, many of the issues should be resolved by the front line/layer 1 or 2 engineers.
Michael Behringer who is one of the lead engineers in network complexity research through an intelligent idea to visualize network complexity as a cube.
The overall complexity of a network is composed of three vectors: the complexity of the physical network, the network management, and the human operator. The volume of the cube represents the complexity of the overall network.
Most of the networks including Enterprises and Service providers had a second complexity model which is shown below at the beginning of the Internet. Small physical network, less network management but mostly operated by humans.
Michael thinks and I definitely agree that :
Large service providers today attempt to lower the dependencies of human operators, and instead use sophisticated management systems. An example complexity cube could look like one illustrated in the first figure.
The overall complexity of today’s networks, illustrated by the volume of the cube, has increased over the years.
Today with the SDN idea, we target to remove the complexity from the operator and shift to network management systems.
Also centralizing the control plane to the logically centralized but physically still distributed place.
This is not a totally bad idea in my opinion since it provides a coherency.
We don’t configure the networks, we configure the routers!
This quote is from Geoff Huston. I think it is very true since;
We try to configure the many routers, switches, etc and wait for the result to be coherent. But in the end, we face all kinds of loops,micro-loops, broadcast storms, routing churns, and policy violations.
Network management systems reduce the effect of those by knowing the entire topology, and intent of the policy and configuring the results to the entire network.
I mentioned above that network design is about making tradeoffs between different design goals.
The network complexity research group published a draft and covered some of the design goals, of course, these are not the full list but it is a good start.
We should add resiliency and fast convergence to the list in my opinion.
But don’t forget that your network doesn’t have to provide all these design goals.
For example, my home network consists of a wireless modem that has one ethernet port. It is not scalable but very cost-effective.
Cost vs Scalability is the tradeoff here.
I don’t need a scalable network in my home if I need it obviously it will cost me more.
Or scalability requirement of your company network is not the same as Amazon probably. But to have an Amazon scale network, you need to invest.
Conclusions :
Have you ever seen catastrophic failure in your network? What was the reason?
Do you remember the” SUCK ” principle? Will you use it anymore?
Orhan Ergun, CCIE/CCDE Trainer, Author of Many Networking Books, Network Design Advisor, and Cisco Champion 2019/2020/2021
He created OrhanErgun.Net 10 years ago and has been serving the IT industry with his renowned and awarded training.
Wrote many books, mostly on Network Design, joined many IETF RFCs, gave Public talks at many Forums, and mentored thousands of his students.
Today, with his carefully selected instructors, OrhanErgun.Net is providing IT courses to tens of thousands of IT engineers.
Write a public review