Network complexity plays a very important role in network design. Every network designer tries to find the simplest design.
What is Network Complexity?Although there is no standard definition for the network complexity yet, there are many subjective definitions.
In today's network designs decisions are taken based on an estimation of network complexity rather than an absolute, solid answer.
If you are designing a network, probably you heard many times a KISS (Keep it simple and stupid) principle.
We said that during a network design you should follow this principle. As you will see later in the article if you want to have a robust network you need some amount of complexity.
Today I throw a new idea that we should use as a principle for the network design.
“SUCK” is the abbreviation of “SO UNNECESSARY COMPLEXITY IS KEY”.
People refuse to have network complexity and believe that network complexity is bad. But this is wrong!
Every network needs complexity, network complexity is good!
Let me explain:
In figure-a in the above picture, a router in the middle is connected to the edge router. Obviously, it is not redundant. If we want to design a resilient network, we add a second router ( figure-b) which creates network complexity but provides resiliency through redundancy.
In order to provide resiliency, we needed complexity. But this is a necessary complexity.
There is an unnecessary complexity that we need to separate from the necessary one as I depicted above. A simple example of unnecessary complexity is adding a 3rd OSPF ABR in picture-1.
Assume that we are running a flat OSPF network as in pictures a and b, state information is kept exactly identical on every node in the domain.
Through layering, complexity can be decreased.
In the figure-c, there is an area routing, so multiple areas is created to allow summarization of reachability information. Thus state in the devices can be kept smaller so complexity might be reduced by limiting the control plane state.
But there are tradeoffs here. In order to reduce the control plane states on those devices, summarization needs to be configured on the ABRs which increases configuration and management complexity.
Although this task can be automated through management systems, someone needs to operate the management systems, so management complexity is not avoided but shifted from operators to management systems.
In this example, placing a second router and then creating multiple OSPF areas allow us to achieve many network design goals. Resiliency (through redundancy, scaling through layering/hierarchy). These are the parameters of robustness.
John Doyle who is a lead scientist in the Network complexity area states that;
Reliability is robustness to component failures.
Efficiency is robustness to resource scarcity.
Scalability is robustness to changes to the size and complexity of the system as a whole.
Modularity is robustness to structure component rearrangements
Evolvability is the robustness of lineages to changes on long time scales
Robust Yet Fragile - RYF ParadigmRobust Yet Fragile is a very important paradigm and helps us to understand the network complexity.
A system can have a property that is robust to one set of perturbations and yet fragile for a different property and/or perturbation.
The Internet is a good example of a robust yet fragile paradigm. It is robust to single component failure but fragile for a targeted attack.
Network design follows the Robust Yet Fragile paradigm. Because RYF touches on the fact that all network designs make tradeoffs between different design goals.
In picture-1, creating multiple OSPF areas provides scalability through summarization/aggregation but it is fragile because creates a chance for a suboptimal routing.
Look at picture-2. We should be in the domain of the Robust and tried to find Pmax. Robustness definitely needs a complexity (at least some) thus NETWORK COMPLEXITY IS GOOD.
What are the elements of the networks?
Elements of the Computer NetworksNetworks have physical elements, external systems, management systems, and operators.
Complexity is found in each sub-component of these elements.
Let me explain the network elements in detail :
The physical network contains:
- Network devices, such as routers, switches, optical equipment, etc. This includes components in those devices, such as CPUs, memory, ASICs, etc. Links between devices.
- External links, to customers and other service providers.
- Support hardware, such as power supplies, heating, cooling, etc.
- Operating systems.
- Device configurations.
- Network state tables, such as routing tables, ARPtables, etc. The management system consists of:
- Hardware used for the management systems, and the network connecting them.
- Operating systems of these management systems
- Software for management, provisioning, etc.
- Operational procedures.
Complexity is in each subcomponent of these three elements. And to understand overall network complexity, we should look at the combination of all the subcomponents.
For example, ASICs of one switch can contain 10 million logic gates but in different switch might have 100 million logic gates in ASIC on the line card.
Or one software might have 1000 features, but another software might have 10000 features. When the features increase on the software the chance of problems in the code increases due to increasing complexity.
In the above picture figure-1 configuration size on the routers in Tier-2 Service providers. The figure-2 size of code is shown on the routers.
As you can see, things tend to grow, not shrink!
Increasing the line of configuration or size of code comes with a cost of complexity.
More features in the software, more configuration on the devices by the time.
Vendors' vulnerability announcements increase every period/year due to added features.
If you think about your network, How many people know about all the config on the router from top to bottom.
Probably no one or very few right?
Security,Routing,MPLS etc all those configuration on the router is managed by the different set of people in the companies !
By the way, I should say that having a different configuration on the 10 interfaces of a router is more complex than having the same configuration on the 1000 interface on that router. This is known as modularity and repeatable configuration and deployments are good.
How do you understand whether the network is complex?Many protocols and features: Networks run many protocols and have processes for their operation.
These protocols interaction creates complexity in your networks.
Example for this, you run OSPF or IS-IS is a link-state protocol, and for the fast reroute you might be running MPLS TE-FRR. To be able to provide it, you need to run not only OSPF or IS-IS but also RSVP and most probably LDP as well.
A friend of mine and one of the lead network designers and architects Russ White said somewhere that his friend defined complexity as ” What you don’t understand is complex “. As I understand from his talk, Russ agreed with him.
I don’t agree with this definition.
It is of course relative but, BGP is not a complex protocol for me and probably for those who read this article up to here. But policy interaction between BGP peers creates Bgp wedgies(RFC 4264) and policy violations due to data plane vs control plane mismatch.
So the complexity here comes from conflicting policy configurations used on two different Autonomous Systems although you understand many things about BGP. (Small amount of input (policy in BGP) creates a large amount of output in complex networks)
Unpredictable : In a complex network, effect of a local change would be an unpredictable on the global network.
Don’t you have a configuration on your routers or firewall which even you don’t know why they are there but you can’t touch them since you can not predict what can happen if you remove them.
Predictability is critical for security. I will explain this later in the article.
Fragility : In a complex networks, change in one piece of the network can break the entire system.
I think layering is a nice example to explain fragility. I use layering terms for the underlay and overlay networks here.
In an MPLS network, you run routing protocol to create a topology and run MPLS control and data plane for the services.
The overlay network should follow the underlay network.
The overlay is LDP and the underlay is IGP.
If failure happens in the network, due to protocol convergence timing, a black hole occurs.
In order to solve this issue, either you enable LDP session protection or LDP-IGP synchronization.
Protocol interactions are the source of complexity, it creates fragility and to make the network more robust you add a new set of features (in this example LDP-IGP Synchronization or Session protection). Added each feature increases the overall complexity.
Expertise in ComplexityExpertise: If some of the failures in your network require the top expert's involvement to resolve the issue, most probably your network is complex.
Ideally, many of the issues should be resolved by the front line/layer 1 or 2 engineers.
Michael Behringer who is one of the lead engineers in network complexity research through an intelligent idea to visualize network complexity as a cube.
The overall complexity of a network is composed of three vectors: the complexity of the physical network, the network management, and the human operator. The volume of the cube represents the complexity of the overall network.
Most of the networks including Enterprises and Service providers had a second complexity model which is shown below at the beginning of the Internet. Small physical network, less network management but mostly operated by humans.
Michael thinks and I definitely agree that :
Large service providers today attempt to lower the dependencies of human operators, and instead use sophisticated management systems. An example complexity cube could look like one illustrated in the first figure.
The overall complexity of today’s networks, illustrated by the volume of the cube, has increased over the years.
Today with the SDN idea, we target to remove the complexity from the operator and shift to network management systems.
Also centralizing the control plane to the logically centralized but physically still distributed place.
This is not a totally bad idea in my opinion since it provides a coherency.
We don’t configure the networks, we configure the routers!
This quote is from Geoff Huston. I think it is very true since;
We try to configure the many routers, switches, etc and wait for the result to be coherent. But in the end, we face all kinds of loops,micro-loops, broadcast storms, routing churns, and policy violations.
Network management systems reduce the effect of those by knowing the entire topology, and intent of the policy and configuring the results to the entire network.
I mentioned above that network design is about making tradeoffs between different design goals.
The network complexity research group published a draft and covered some of the design goals, of course, these are not the full list but it is a good start.
Network Design Goals
Cost:How much does the network cost to build (CAPEX) and run (OPEX)
Bandwidth / delay / jitter:Traffic characteristics between two points (average, max)
Configuration complexity:How hard to configure and maintain the configuration
Susceptibility to Denial-of-Service:How easy is it to attack the service
Security (confidentiality / integrity):How easy is it to sniff /modify/insert the data flow
Scalability:To what size can I grow the network / service
Extensibility:Can I use the network for other services in the future?
Ease of troubleshooting:How hard is it to find and correct problems?
Predictability:If I change a parameter, what will happen?
Clean failure:When a problem arises, does the root cause lead to deterministic failure
We should add resiliency and fast convergence to the list in my opinion.
But don’t forget that your network doesn’t have to provide all these design goals.
For example, my home network consists of a wireless modem that has one ethernet port. It is not scalable but very cost-effective.
Cost vs Scalability is the tradeoff here.
I don’t need a scalable network in my home if I need it obviously it will cost me more.
Or scalability requirement of your company network is not the same as Amazon probably. But to have an Amazon scale network, you need to invest.
- If you need a robust network, you need some amount of complexity.
- You should separate necessary complexity from unnecessary complexity. If you need redundancy dual redundancy is generally good and enough. You can unnecessarily make it complex by adding a third level of redundancy.
- You can come up with many valid network designs for the given requirements, and eliminate the ones which have unnecessary complexity.
- We don’t have a numeric number for the network complexity, for example, you can’t say that out of 10, my network complexity is 6 and if I add or remove this feature, protocol, link, etc I can reduce it to 5. We are seeking to find a way to have these numbers.
- Network design is about managing the tradeoffs between different design goals.
- Not all network design has to be scalable, fast convergence, maximum resiliency characteristics, and so on.
- Complexity can be shifted between physical networks, operators, and network management systems, and overall complexity is reduced by taking the human factor away. A complexity cube is a good idea to understand this.SDN helps to reduce overall network complexity by taking some responsibility from the human operators.
- Network design follows the Robust Yet Fragile paradigm. Robustness requires complexity.
- Don’t try the fancy, bleeding-edge technologies just to show that you are smart!
- System complexity is not the same as network complexity. System complexity should be thought as the combination of the edges (hosts, servers, virtual servers, etc) and the network core. What about you?
Have you ever seen catastrophic failure in your network? What was the reason?
Do you remember the” SUCK ” principle? Will you use it anymore?