Created by - Orhan Ergun
Network complexity plays a very important role in network design. Every network designer tries to find the simplest design. What is Network Complexity? Although there is no standard definition for the network complexity yet, there are many subjective definitions. In today's network designs decisions are taken based on an estimation of network complexity rather than an absolute, solid answer. If you are designing a network, probably you heard many times a KISS (Keep it simple and stupid) principle. We said that during a network design you should follow this principle. As you will see later in the article if you want to have a robust network you need some amount of complexity. Today I throw a new idea that we should use as a principle for the network design. “SUCK” is the abbreviation of “SO UNNECESSARY COMPLEXITY IS KEY”. People refuse to have network complexity and believe that network complexity is bad. But this is wrong! Every network needs complexity, network complexity is good! Let me explain: In figure-a in the above picture, a router in the middle is connected to the edge router. Obviously, it is not redundant. If we want to design a resilient network, we add a second router ( figure-b) which creates network complexity but provides resiliency through redundancy. In order to provide resiliency, we needed complexity. But this is a necessary complexity. There is an unnecessary complexity that we need to separate from the necessary one as I depicted above. A simple example of unnecessary complexity is adding a 3rd OSPF ABR in picture-1. Assume that we are running a flat OSPF network as in pictures a and b, state information is kept exactly identical on every node in the domain. Through layering, complexity can be decreased. In the figure-c, there is an area routing, so multiple areas is created to allow summarization of reachability information. Thus state in the devices can be kept smaller so complexity might be reduced by limiting the control plane state. But there are tradeoffs here. In order to reduce the control plane states on those devices, summarization needs to be configured on the ABRs which increases configuration and management complexity. Although this task can be automated through management systems, someone needs to operate the management systems, so management complexity is not avoided but shifted from operators to management systems. In this example, placing a second router and then creating multiple OSPF areas allow us to achieve many network design goals. Resiliency (through redundancy, scaling through layering/hierarchy). These are the parameters of robustness. John Doyle who is a lead scientist in the Network complexity area states that; Reliability is robustness to component failures.Efficiency is robustness to resource scarcity. Scalability is robustness to changes to the size and complexity of the system as a whole. Modularity is robustness to structure component rearrangements Evolvability is the robustness of lineages to changes on long time scales Robust Yet Fragile - RYF Paradigm Robust Yet Fragile is a very important paradigm and helps us to understand the network complexity. A system can have a property that is robust to one set of perturbations and yet fragile for a different property and/or perturbation. The Internet is a good example of a robust yet fragile paradigm. It is robust to single component failure but fragile for a targeted attack. Network design follows the Robust Yet Fragile paradigm. Because RYF touches on the fact that all network designs make tradeoffs between different design goals. In picture-1, creating multiple OSPF areas provides scalability through summarization/aggregation but it is fragile because creates a chance for a suboptimal routing.Look at picture-2. We should be in the domain of the Robust and tried to find Pmax. Robustness definitely needs a complexity (at least some) thus NETWORK COMPLEXITY IS GOOD. What are the elements of the networks? Elements of the Computer Networks Networks have physical elements, external systems, management systems, and operators. Complexity is found in each sub-component of these elements. Let me explain the network elements in detail : The physical network contains: Network devices, such as routers, switches, optical equipment, etc. This includes components in those devices, such as CPUs, memory, ASICs, etc. Links between devices. External links, to customers and other service providers. Support hardware, such as power supplies, heating, cooling, etc. Operating systems. Device configurations. Network state tables, such as routing tables, ARPtables, etc. The management system consists of: Hardware used for the management systems, and the network connecting them. Operating systems of these management systems Software for management, provisioning, etc. Operational procedures. The operator is an abstract notion for the combined knowledge required to operate the network. Complexity is in each subcomponent of these three elements. And to understand overall network complexity, we should look at the combination of all the subcomponents. For example, ASICs of one switch can contain 10 million logic gates but in different switch might have 100 million logic gates in ASIC on the line card. Or one software might have 1000 features, but another software might have 10000 features. When the features increase on the software the chance of problems in the code increases due to increasing complexity.In the above picture figure-1 configuration size on the routers in Tier-2 Service providers. The figure-2 size of code is shown on the routers. As you can see, things tend to grow, not shrink! Increasing the line of configuration or size of code comes with a cost of complexity. More features in the software, more configuration on the devices by the time. Vendors' vulnerability announcements increase every period/year due to added features. If you think about your network, How many people know about all the config on the router from top to bottom. Probably no one or very few right?Security,Routing,MPLS etc all those configuration on the router is managed by the different set of people in the companies ! By the way, I should say that having a different configuration on the 10 interfaces of a router is more complex than having the same configuration on the 1000 interface on that router. This is known as modularity and repeatable configuration and deployments are good. How do you understand whether the network is complex? Many protocols and features: Networks run many protocols and have processes for their operation. These protocols interaction creates complexity in your networks. Example for this, you run OSPF or IS-IS is a link-state protocol, and for the fast reroute you might be running MPLS TE-FRR. To be able to provide it, you need to run not only OSPF or IS-IS but also RSVP and most probably LDP as well. A friend of mine and one of the lead network designers and architects Russ White said somewhere that his friend defined complexity as ” What you don’t understand is complex “. As I understand from his talk, Russ agreed with him. I don’t agree with this definition. It is of course relative but, BGP is not a complex protocol for me and probably for those who read this article up to here. But policy interaction between BGP peers creates Bgp wedgies(RFC 4264) and policy violations due to data plane vs control plane mismatch. So the complexity here comes from conflicting policy configurations used on two different Autonomous Systems although you understand many things about BGP. (Small amount of input (policy in BGP) creates a large amount of output in complex networks)Unpredictable : In a complex network, effect of a local change would be an unpredictable on the global network. Don’t you have a configuration on your routers or firewall which even you don’t know why they are there but you can’t touch them since you can not predict what can happen if you remove them. Predictability is critical for security. I will explain this later in the article.Fragility : In a complex networks, change in one piece of the network can break the entire system. I think layering is a nice example to explain fragility. I use layering terms for the underlay and overlay networks here. In an MPLS network, you run routing protocol to create a topology and run MPLS control and data plane for the services. The overlay network should follow the underlay network. The overlay is LDP and the underlay is IGP. If failure happens in the network, due to protocol convergence timing, a black hole occurs. In order to solve this issue, either you enable LDP session protection or LDP-IGP synchronization. Protocol interactions are the source of complexity, it creates fragility and to make the network more robust you add a new set of features (in this example LDP-IGP Synchronization or Session protection). Added each feature increases the overall complexity. Expertise in Complexity Expertise: If some of the failures in your network require the top expert's involvement to resolve the issue, most probably your network is complex. Ideally, many of the issues should be resolved by the front line/layer 1 or 2 engineers. Michael Behringer who is one of the lead engineers in network complexity research through an intelligent idea to visualize network complexity as a cube. The overall complexity of a network is composed of three vectors: the complexity of the physical network, the network management, and the human operator. The volume of the cube represents the complexity of the overall network. Most of the networks including Enterprises and Service providers had a second complexity model which is shown below at the beginning of the Internet. Small physical network, less network management but mostly operated by humans. Michael thinks and I definitely agree that : Large service providers today attempt to lower the dependencies of human operators, and instead use sophisticated management systems. An example complexity cube could look like one illustrated in the first figure. The overall complexity of today’s networks, illustrated by the volume of the cube, has increased over the years.Today with the SDN idea, we target to remove the complexity from the operator and shift to network management systems. Also centralizing the control plane to the logically centralized but physically still distributed place. This is not a totally bad idea in my opinion since it provides a coherency. We don’t configure the networks, we configure the routers! This quote is from Geoff Huston. I think it is very true since; We try to configure the many routers, switches, etc and wait for the result to be coherent. But in the end, we face all kinds of loops,micro-loops, broadcast storms, routing churns, and policy violations. Network management systems reduce the effect of those by knowing the entire topology, and intent of the policy and configuring the results to the entire network. I mentioned above that network design is about making tradeoffs between different design goals. The network complexity research group published a draft and covered some of the design goals, of course, these are not the full list but it is a good start. Network Design Goals Cost: How much does the network cost to build (CAPEX) and run (OPEX) Bandwidth / delay / jitter: Traffic characteristics between two points (average, max) Configuration complexity: How hard to configure and maintain the configuration Susceptibility to Denial-of-Service: How easy is it to attack the service Security (confidentiality / integrity): How easy is it to sniff /modify/insert the data flow Scalability: To what size can I grow the network / service Extensibility: Can I use the network for other services in the future? Ease of troubleshooting: How hard is it to find and correct problems? Predictability: If I change a parameter, what will happen? Clean failure: When a problem arises, does the root cause lead to deterministic failure We should add resiliency and fast convergence to the list in my opinion. But don’t forget that your network doesn’t have to provide all these design goals. For example, my home network consists of a wireless modem that has one ethernet port. It is not scalable but very cost-effective. Cost vs Scalability is the tradeoff here. I don’t need a scalable network in my home if I need it obviously it will cost me more. Or scalability requirement of your company network is not the same as Amazon probably. But to have an Amazon scale network, you need to invest. Conclusions : If you need a robust network, you need some amount of complexity. You should separate necessary complexity from unnecessary complexity. If you need redundancy dual redundancy is generally good and enough. You can unnecessarily make it complex by adding a third level of redundancy. You can come up with many valid network designs for the given requirements, and eliminate the ones which have unnecessary complexity. We don’t have a numeric number for the network complexity, for example, you can’t say that out of 10, my network complexity is 6 and if I add or remove this feature, protocol, link, etc I can reduce it to 5. We are seeking to find a way to have these numbers. Network design is about managing the tradeoffs between different design goals. Not all network design has to be scalable, fast convergence, maximum resiliency characteristics, and so on. Complexity can be shifted between physical networks, operators, and network management systems, and overall complexity is reduced by taking the human factor away. A complexity cube is a good idea to understand this.SDN helps to reduce overall network complexity by taking some responsibility from the human operators. Network design follows the Robust Yet Fragile paradigm. Robustness requires complexity. Don’t try the fancy, bleeding-edge technologies just to show that you are smart! System complexity is not the same as network complexity. System complexity should be thought as the combination of the edges (hosts, servers, virtual servers, etc) and the network core. What about you? What is your definition of network complexity? Have you ever seen catastrophic failure in your network? What was the reason? Do you remember the” SUCK ” principle? Will you use it anymore?
Published - Mon, 11 Apr 2022
Created by - Orhan Ergun
EIGRP Stub - It is actually one of the EIGRP Scalability features but also it helps many other things in EIGRP. Also, in this post, we will share a topology that will be used to explain some design caveats with EIGRP network design. Before we explain the EIGRP Stub, let me explain some EIGRP convergence behaviors. EIGRP Convergence If you are looking for much more detail on EIGRP Design and Practical Labs, have a look at our EIGRP Training. When the EIGRP node loses the Connection to the prefixes. If there is no feasible successor installed in the EIGRP topology database. The router is marked as active and the EIGRP query is sent to every neighbor. In the above topology, Router D doesn’t know the 192.168.0.0/24 network. Router C sends a summary 192.168.0.0/16. That’s why it replies without asking Router E. Router B has an alternate path, thus, Router B replies immediately. Router J doesn’t have any EIGRP neighbors. It replies to the Query immediately. Router G doesn’t know the 192.168.0.0/24 network. Router F filters the 192.168.0.0/24. That’s why Router G replies without asking Router H. So, as you can see, even if you filter or summarize the prefixes, EIGRP Query is sent to the neighbor and the neighbor also send a query to the router that is one more hop away. EIGRP Stub Feature If you want to stop EIGRP Query to be sent completely, you can only do it EIGRP Stub feature. With the EIGRP Stub feature basically, you are creating an artificial split horizon for prefix advertisement!. Wow, this is so nerdy sentence, let me explain with the below topology :) In the above figure, if one of the routers in the spoke site 1 receives prefixes from the hub router, it doesn't advertise to another spoke on the same site, although they are on the same site if the EIGRP Stub feature is enabled on those spoke routers.So, with the EIGRP Stub feature, when a router receives a routing advertisement, it doesn't advertise to another EIGRP router. Classical Split Horizon. Same mechanisms we have in IBGP as well. Thus, in the above figure, when there is a failure, spoke routers can be isolated from the rest of the network. Either, you should connect the spokes to both of the Hub routers, or you need to leak the prefixes between the spokes, or you need to put a static route for the rest of the network reachability on the spoke routers with higher AD. This design caveat will be explained in another post in more detail. If you are looking for more EIGRP posts, please refer to our EIGRP category.
Published - Sun, 10 Apr 2022
Created by - Orhan Ergun
Flat/Single Level vs. Multi Level IS-IS Design Comparison. Flat routing means, without hierarchy, entire topology information of the network is known by each and every device in the network. IS-IS has two levels. Thus, for IS-IS, Multi Level means Two Level IS-IS. Level 1 and Level 2. When we have two levels, Level 1 routers don't know the topology of Level 2 and vice versa. By hiding topology information of different level routers, scalability is achieved. Reason we achieve more scalable network is when there is a failure or new information added or metric changes in one Level, another level doesn't run SPF algorithm. But what are the design consideration when we have Flat or Multi Level IS-IS networks. Is Multi Level IS-IS design, which mean, Hierarchical IS-IS design always good? Answer is no. Although Multi Level provides Scalability, it comes with extra complexity and end to end routing convergence time increase.So, I prepared below comparison charts to discuss different design aspects when it comes to IS-IS Single vs. Multi Level design. If you like this comparison chart, you can see more of them in my CCIE Enterprise Training.
Published - Fri, 07 Aug 2020
Created by - Orhan Ergun
Is Inter-AS MPLS VPNs commonly deployed ? In real-life deployment which Inter-AS MPLS VPN Option is most common ? What are the use cases of Inter-AS MPLS VPNs ? This is not a theory post , I will share practical information with you. For those who want to learn the details of Inter-AS MPLS VPNs, I wrote an article on Inter-AS Option A , Option B , Option C and Option AB earlier, you should take a look at those. For those who know the different deployment options of Inter-AS MPLS VPNs, you can skip those posts and continue to read this post.In fact, Inter-AS MPLS VPNs are more common than you imagine.You might think that it is only deployed between the companies to support the common/dual-homed customers but this is not the use case. I have seen and involved, so many Inter-AS MPLS VPN deployments and more and more what I see is, companies which have an operation in more than one country, they separate IGP and BGP. They use separate IGP and BGP domains per country and to offer MPLS VPN service to customers which have a location in more than one country, these Service Provider create an Inter-AS MPLS VPNs. So, in their network, they use different AS number per country and connect the countries by using one of the Inter-AS MPLS VPN design options. When you research Inter-AS MPLS VPN options, you commonly see that Inter-AS MPLS VPN Option C is the most scalable option and many people recommend , Option C to be used between the ASes which belong the same business due to security concerns. Only if different ASes (Autonomous Systems) belong to same business, same company.This is not the common case in real life deployment. Even the companies which have an operation in many countries, deploy either Inter-AS Option A or B. I would say Option A is most commonly deployed, due to its simplicity. One of these companies is looking to stop this design and have flat network.Which mean, instead of having separate IGP and BGP domains, they want to convert their design to single IGP and single BGP AS. They don’t have so much routers per country. Configuring and managing so many entry for the Inter-AS customers is just cumbersome for them. Also, when they have flat network design, they know that they will have better control on their network. Traffic engineering will be easier. They could have a problem with the IGP scalability, but as I said above, they don’t have so many routers in IGP. I mentioned earlier that, I have a customer who has 200 routers in OSPF domain and they don’t have a problem with their OSPF deployment. Careful readers will remember the important OSPF feature which that company deployed to support 200 routers in a OSPF domain.. What was that ? Share your answer in the comment box below.
Published - Tue, 26 Nov 2019