BGP Route reflector routing loop arise in IP networks. In this post, I will illustrate the topology which will loop the IP packets between the routers and I will describe multiple possible solution and share a best practice to design BGP Route Reflector in an IP network. Read more
BFD is not a fast convergence mechanism. BFD stands for Bidirectional Forwarding Detection. It is an important tool for the IP layer but there is a confusion in the network community about it. Read more
Networks can be really simple but we insist on making it complex ! Read more
Most fundamental network design attribute should be simplicity.
What is KISS Principle ? Okay it stands for Keep it Simple and Stupid but what does really it mean in networking ?
OSPF Best Practices
Understanding and using best practices is very important though may not be feasible in all networks due to budget , political or other technical constraints.
In this post I will explain the best practices on OSPF networks. This best practices come from my real life design and deployment experience , knowledge and lessons learned of 15 years of Enterprise, Service Provider and Mobile Operator networking background.
Before we start, I want to touch briefly on Topology and Reachability information in OSPF as I will use these terms many times throughout this post and you’ll see whenever you study network design.
Reachability information means, IP address and subnets on the devices and the links. Router loopbacks, and the links between the routers have an IP address and these information are exchanged between the routers in OSPF. This process is known as control plane learning.
Topology information means, connection between the routers, metric information , which router is connected to which one. With this information, routers find a shortest path tree in OSPF. Note that IS-IS uses the same process to find a shortest path for each destination but there is no topology information in EIGRP. In other words, EIGRP neighbors don’t send topology information to each other.
Another term which I will use throughout this post is single area design.
Single area OSPF design is also known as Flat OSPF design. Generally we refer OSPF Area 0 only (Backbone area) deployment. There is no second area, all the nodes are in the backbone area.
- Stub, Totally Stub, NSSA and Totally NSSA Areas can create sub optimal routing in the network.Because these are types prevent some information into an area. Whenever there is specific information in the routing table, optimal path can be found , whenever there is summarization (less reachability information in the routing table) suboptimal routing might occur.
- OSPF Areas are used for scalability. If you don’t have valid reason such as 100s of routers, or resource problems on the routers, don’t use multiple areas.
- OSPF Multi area design increases the network complexity. Complexity sometimes is necessary and not the bad thing but just aware that multi area design compare to single/flat OSPF area design is more complex as you need to place ABR in the correct place, dealing with the multi area design related problems such as MPLS Traffic Engineering and MPLS LSP issues.
- Two is company, three is crowded in design. Having two OSPF ABR provides high availability but three ABR is not a good idea. Unless you have a capacity requirement , I don’t recommend to have three links , nodes , logical entity and so on in the networks.
- ABR slows down the network convergence. Knowing this important, without ABR in single/flat OSPF design, there is no Type 1, Type 2 to Type 3 LSA generation, similarly Type 4 LSAs also regenerated from the Type 1 LSAs.
- Having separate OSPF area per router is generally considered as bad. You should monitor the routers resources carefully and placed as much routers as you can in one OSPF area.
- Not every router has powerful CPU and Memory, you can split up the router based on their resource availability. Low end devices can be placed in a separate OSPF area and that area type can be changed as Stub, Totally Stub, NSSA or Totally NSSA.
- Always look for the summarization opportunity, but know that summarization can create sub optimal routing. Sub optimal routing may not be a problem for some applications but some applications require very low delay , jitter and packet loss. Sub optimal routing increases a chance of delay (latency).
- Good IP addressing plan is important for OSPF Multi Area design. It allows OSPF summarization (Reachability) thus faster convergence and smaller routing table.
- Having smaller routing table provides easier troubleshooting. Dealing with less information decreases mean time to repair. Identifying the problem and fixing would be faster. Because there will be less routing prefixes in the routing table and the routing protocol databases so troubleshooting would be much easier and it would be probably manageable by the average skilled engineers.
- Having smaller routing table increases convergence time as well. Summarization reduces the routing table size that’s why provides faster network convergence.
- OSPF NSSA area in general is used at the Internet Edge of the network since on the Internet routers where you don’t need to have all the OSPF LSAs yet still redistribution of selected BGP prefixes are common.
- Topology information is not sent between different OSPF areas, this reduces the flooding domain and allows large scale OSPF deployment. If you have 100s of routers in your network, you can consider splitting the OSPF domain into Multiple OSPF areas. But there are other considerations for Multi Area design and will be explained in this chapter.
- Use passive interface as much as you can. Passive interface should be enabled if you don’t want to setup an OSPF neighborship.
- For very large scale OSPF design, transit subnets can be removed from the OSPF topology. This has been defined in RFC 6860. This feature is known as ‘ prefix suppression ‘ on Cisco routers. Removing these links reduces the routing table size thus increases the network convergence and makes troubleshooting easier.
- If there will be maintenance on the router which runs OSPF , ‘ max-metric router lsa ‘ should be enabled to remove the router from the topology without having packet loss. Actually router still stays in the OSPF topology but since it will advertise maximum metric in Type 1 LSA (Router LSA), traffic is not forwarded to it, if there is an alternate path. If there is no alternate path, even with the ‘ max-metric router lsa ‘ router receives network traffic.
Quality of Service Best Practices
What is best practice ? Below is a Wikipedia definition of best practice. This apply to education as well.
A best practice is a method or technique that has been generally accepted as superior to any alternatives because it produces results that are superior to those achieved by other means or because it has become a standard way of doing things, e.g., a standard way of complying with legal or ethical requirements.Always classify and mark applications as close to their sources as possible.
Although in real life designs we may not be able to follow best practice network design due to many constraints such as technical , budgetary or political constrains, knowing the best practices is very critical for network design in real life as well as in the exams.
Thus below are the general accepted Quality of Service Best Practices. I covered Quality of Service Best Practices and the many other technology best practices in the CCDE In-Depth which is my latest network design book.
- Classification and marking usually done on both ingress and egress direction but queuing and shaping usually are done on Egress.
- Ingress Queening can be done to prevent Head Of Line blocking. Other wise, queuing is done almost in any case at the egress interface.
- Less granular fields such as CoS and MPLS EXP (Due to number of bits) should be mapped to DSCP as close to the traffic source as possible. COS and EXP bits are 3 bits. Thus you can have maximum 8 classes with them. DSCP is 6 bits and 64 different classes can be used. Thus DSCP is considered as more granular. This knowledge is important because when MPLS Layer 3 and Layer 2 VPN is compared, MPLS Layer 3 VPN provides more granular QoS as it uses DSCP instead of COS (Class of Service bits which is carried in Layer 2)
- Follow standards based Diffserv PHB markings if possible to ensure interoperability with SP networks, enterprise networks or merging networks together. RFC 4594 provides configuration guidelines for Diffserv Service Classes.
- If there is real time, delay sensitive traffic, LLQ should be enabled. Because LLQ is always served before than any other queuing mechanism. When the traffic in LLQ is finished, the other queues are handled.
- LLQ is the combination of CBWFQ (Class based weighted fair queuing) and Priority Queuing.
- Enable queuing at every node, which has potential for congestion. For example in Wide Area Network edge node, generally the bandwidth towards wide area network is less than local area network or datacenter, thus WAN edge is common place of QoS queuing mechanism.
- Limit LLQ to 33% of link bandwidth capacity. Otherwise real time traffic such as voice can eat up all the bandwidth and other applications suffer in case of congestion.
- Enable Admission Control on LLQ. This is very important since if you allocated a bandwidth which can accommodate 10 voice call only, 11th voice call disrupts all 11 calls. Not only the 11th call. Admission control for real time traffic is important.
- Policing should be done as close to the source as possible.Because you don’t want to carry the traffic which would be dropped any way. (This is a common network design suggestion which I give my clients for security filters). This is one of the most important Quality of Service Best Practices.
- Do not enable WRED on LLQ. (WRED is only effective on TCP based applications. Most if not all real time applications use UDP, not TCP)
- Allocate 25% of the capacity for the Best Effort class if there is large number of application in the default class.
- For a link carrying a mix of voice, video and data traffic, limit the priority queue to 33% of the link bandwidth.
- Use WRED for congestion avoidance on TCP traffic. WRED is effective only for TCP traffic.
- Use DSCP based WRED wherever possible. This provides more granular implementation.
- Always enable QoS in hardware as opposed to software if possible. In the campus environment, you should enable classification and marking on the switches as opposed to routers. Switches provide hardware based Quality of Service.
- Because 802.1p bit (COS bits) is lost when the packet enters the IP or MPLS domain, mapping is needed. Always implement QoS at the hardware, if possible, to avoid performance impact.
- Switches support QoS in the hardware, so, for example, in the campus, classify and mark the traffic at the switches.