Network complexity plays a very important role during network design. Every network designer tries to find the simplest design.
Although there is no standard definition for the network complexity yet, there are many subjective definitions.
In today network designs decisions are taken based on an estimation of network complexity rather than absolute, solid answer.
If you are designing a network, probably you heard many times a KISS (Keep it simple and stupid) principle.
We said that during a network design you should follow this principle. As you will see in the later in the article ,if you want to have robust network you need some amount of complexity.
Today I throw a new idea which we should use as a principle for the network design.
“SUCK” it is the abbreviation of “SO UNNECESSARY COMPLEXITY IS KEY”.
People refuse to have network complexity and believe that network complexity is bad. But this is wrong !
Every network needs complexity, network complexity is good !
Let me explain:
In the figure-a in above picture, router in the middle is connected to the edge router. Obviously it is not redundant. If we want to design resilient network, we add second router ( figure-b) which creates network complexity but provides resiliency through redundancy.
In order to provide resiliency we needed a complexity. But this is a necessary complexity. There is an unnecessary complexity which we need to separate from the necessary one as I depicted above. Simple example for the unnecessary complexity is adding a 3 OSPF ABR in the picture-1.
Assume that we are running flat OSPF network as in the picture a and b, state information is kept exactly identical on every node in the domain.
Through layering complexity can be decreased. In the figure-c, there is an area routing, so multiple area is created to allow summarization of reachability information. Thus state in the devices can be kept smaller so complexity might be reduced by limiting the control plane state.
But there are tradeoffs here. In order to reduce the control plane states on those devices, summarization needs to be configured on the ABRs which increases configuration and management complexity.
Although this task can be automated through management systems, someone needs to operate the management systems, so management complexity is not avoided but shifted from operators to management systems.
In this example, placing a second router and then creating multiple OSPF areas allow us to achieve many network design goal. Resiliency (through redundancy, scaling through layering/hierarchy). These are the parameters of robustness.
John Doyle who is a lead scientist of the Network complexity area states that;
Reliability is robustness to component failures.
Efficiency is robustness to resource scarcity.
Scalability is robustness to changes to the size and complexity of the system as whole.
Modularity is robustness to structure component rearrangements
Evolvability is robustness of lineages to changes on longtime scales
Robust Yet Fragile is very important paradigm and helps us to understand the network complexity.
A system can have a property that is robust to one set of perturbations and yet fragile for a different property and/or perturbation.
Internet is a good example for robust yet fragile paradigm. It is robust to single component failure but fragile for a targeted attack.
Network design follows Robust Yet Fragile paradigm. Because RYF touches on the fact that all network designs make a tradeoffs between different design goals.
In the picture-1, creating multiple OSPF areas provides scalability through summarization/aggregation but it is fragile because creates a chance for a suboptimal routing.
Look at picture-2. We should be in the domain of the Robust and tried to find Pmax. Robustness definitely needs a complexity (at least some) thus NETWORK COMPLEXITY IS GOOD.
What are the elements of the networks ?
Networks have physical elements, external systems, management systems and operator.
Complexity is found in each sub component of these elements.
Let me explain the network elements in detail :
The physical network contains:
- Network devices, such as routers, switches, opticalequipment, etc. This includes components in thosedevices, such as CPUs, memory, ASICs, etc.
- Links between devices.
- External links, to customers and other serviceproviders.
- Support hardware, such as power supplies, heating,cooling, etc.
- Operating systems.
- Device configurations.
- Network state tables, such as routing tables, ARPtables, etc.
The management system consists of:
- Hardware used for the management systems, andthe network connecting them.
- Operating systems of these management systems
- Software for management, provisioning, etc.
- Operational procedures.
The operator is an abstract notion for the combined knowledge required to operate the network.
Complexity is in each subcomponent of these three elements. And to understand overall network complexity, we should look at combination of all the subcomponents.
For example, ASICs of one switch can contain 10 million logic gates but in different switch might have 100 million logic gates in ASIC on the line card.
Or in one software might have 1000 of features, but another software might have 10000 of features. When the features increase on the software the chance of problems in the code increases due to increasing complexity.
In the above picture figure-1 configuration size on the routers in Tier-2 Service provider.In the figure-2 size of code is shown on the routers.
As you can see, things tend to grow, not shrink !
Increasing line of configuration or size of code comes with a cost of complexity. More features in the softwares, more configuration on the devices by the time.
Vendors vulnerability announcements increase every period/year due to added features.
If you think about your network, How many people knows about all the config on the router from top to bottom.
Probably no one or very few right?
Security,Routing,MPLS etc all those configuration on the router is managed by the different set of people in the companies !
By the way I should say that having a different configuration on the 10 interface of a router is more complex than having same configuration on the 1000 interface on that router. This is known as modularity and repeatable configuration and deployments are good.
How do you understand whether the network is complex ?
Many protocols and features : Networks run many protocols and have processes for their operation. These protocol interaction creates a complexity in your networks.
Example for this, you run OSPF or IS-IS as a link state protocols and for the fast reroute you might be running MPLS TE-FRR. To be able to provide it, you need to run not only OSPF or IS-IS but also RSVP and most probably LDP as well.
Friend of mine and one of the lead network designer and architect Russ White said somewhere that his friend defined the complexity as ” What you don’t understand is complex “. As I understand from his talk, Russ was agree with him.
I don’t agree with this definition.
It is of course relative but, BGP is not a complex protocol for me and probably for those who read this article up to here. But policy interaction between BGP peers create Bgp wedgies(RFC 4264) and policy violations due to data plane vs control plane mismatch.
So the complexity here comes from conflicting policy configuration used on two different Autonomous Systems although you understand many thing about BGP.(Small amount of input (policy in BGP) creates large amount of output in complex networks)
Unpredictable : In a complex network, effect of a local change would be an unpredictable on the global network.
Don’t you have a configuration on your routers or firewall which even you don’t know why they are there but you can’t touch them since you can not predict what can happen if you remove them.
Predictability is critical for the security. I will explain this later in the article.
Fragility : In a complex networks, change in one piece of the network can break the entire system.
I think layering is a nice example to explain fragility. I use layering term for the underlay and overlay networks here.
In an MPLS networks, you run routing protocol to create a topology and run MPLS control and data plane for the services. Overlay network should follow the underlay network. Overlay is LDP and underlay is IGP. If failure happens in the network, due to protocol convergence timing, blackhole occur. In order to solve this issue, either you enable LDP session protection or LDP-IGP synchronization.
Protocol interactions is the source of complexity, it creates fragility and to make the network more robust you add new set of features (in this example LDP-IGP Synchronization or Session protection). Added each feature increases the overall complexity.
Expertise : If some of the failures in your network require the top experts involvement to resolve the issue, most probably your network is complex.
Ideally many of the issues should be resolved by the front line/layer 1 or 2 engineers.
Michael Behringer who is one of the lead engineer in network complexity research through an intelligent idea to visualise network complexity as a cube.
The overall complexity of a network is composed of three vectors: the complexity of the physical network, of the network management, and of the human operator. The volume of the cube represents the complexity of the overall network.
Most of the networks including Enterprises and Service providers had a second complexity model which is shown below in the beginning of the Internet. Small physical network,less network management but mostly operated by humans.
Michael thinks and I definitely agree that :
Large service providers today attempt to lower the dependencies of human operators, and instead use sophisticated management systems. An example complexity cube could look like illustrated in the first figure. Overall complexity of today’s networks, illustrated by the volume of the cube, has increased over the years.
Today with the SDN idea, we target to remove the complexity from the operator and shifting to network management systems. Also centralising control plane to the logically centralized but physically still distributed place.
This is not a totally bad idea in my opinion since it provides a coherency.
We don’t configure the networks, we configure the routers !
This quote is from Geoff Huston. I think it is very true since;
We try to configure the many routers, switches etc and wait the result to be a coherent. But at the end we face all kind of loops,micro loops, broadcast storms, routing churns, policy violations.
Network management systems reduce the effect of those by knowing the entire topology, intend of the policy and configure the results to entire network.
I mentioned above that network design is about making a tradeoffs between different design goals.
Network complexity research group published a draft and covered some of the design goals, of course these are not the full list but it is a good start.
- Cost: How much does the network cost to build (capex) and run
- Bandwidth / delay / jitter: Traffic characteristics between two points (average, max)
- Configuration complexity: How hard to configure and maintain the configuration
- Susceptibility to Denial-of-Service: How easy is it to attack the service
- Security (confidentiality / integrity): How easy is it to sniff /modify / insert the data flow
- Scalability: To what size can I grow the network / service
- Extensibility: Can I use the network for other services in the
- Ease of troubleshooting: How hard is it to find and correct problems?
- Predictability: If I change a parameter, what will happen?
- Clean failure: When a problem arises, does the root cause lead to
We should add resiliency and fast convergence into the list in my opinion.
But don’t forget that your network don’t have to provide all these design goals.
For example my home network consist of wireless modem which has one ethernet port.It is not scalable but very cost effective.
Cost vs Scalability is the tradeoff here.
I don’t need scalable network in my home if i need it obviously it will cost me more.
Or scalability requirement of your company network is not the same as Amazon probably. But to have Amazon scale network, you need to invest.
- If you need robust network, you need some amount of complexity.
- You should separate necessary complexity from an unnecessary complexity. If you need redundancy dual redundancy is generally good and enough. You can unnecessarily make it complex by adding third level of redundancy.
- You can come up with many valid network design for the given requirements, eliminate the ones which have unnecessary complexity.
- We don’t have numeric number for the network complexity, for example you can’t say that out of 10, my network complexity is 6 and if I add or remove this feature,protocol,link etc I can reduce it to 5. We are seeking to find a way to have these numbers.
- Network design is about managing the tradeoffs between different design goals.
- Not all network design has to be scalable,fast convergence,maximum resiliency characteristics and so on.
- Complexity can be shifted between physical network, operators and network management systems and overall complexity is reduced by taking the human factor away. Complexity cube is a good idea to understand this.SDN helps to reduce overall network complexity by taking some responsibility from the human operators.
- Network design follows Robust Yet Fragile paradigm. Robustness requires complexity.
- Don’t try the fancy,bleeding edge technologies just to show that you are smart !
- System complexity is not the same as network complexity. System complexity should be thought as the combination of the edges (hosts,servers,virtual servers etc) and the network core.
What about you ?
What is your definition for the network complexity ?
Have you ever seen catastrophic failure in your network ? What was the reason ?
Do you remember ” SUCK ” principle ? Will you use it anymore ?