Orhan Ergun 6 Comments

ealing

OSPF Best Practices

Understanding and using best practices is very important though may not be feasible in all networks due to budget , political or other technical constraints.

 

In this post I will explain the best practices on OSPF networks. This best practices come from my real life design and deployment experience , knowledge and lessons learned of 15 years of Enterprise, Service Provider and Mobile Operator networking background.

 

Before we start, I want to touch briefly on Topology and Reachability information in OSPF as I will use these terms many times throughout this post and you’ll see whenever you study network design.

Reachability information means, IP address and subnets on the devices and the links. Router loopbacks, and the links between the routers have an IP address and these information are exchanged between the routers in OSPF. This process is known as control plane learning.

Topology information means, connection between the routers, metric information , which router is connected to which one. With this information, routers find a shortest path tree in OSPF.  Note that IS-IS uses the same process to find a shortest path for each destination but there is no topology information in EIGRP. In other words, EIGRP neighbors don’t send topology information to each other.

 

Another term which I will use throughout this post is single area design.

Single area OSPF design is also known as Flat OSPF design. Generally we refer OSPF Area 0 only (Backbone area) deployment. There is no second area, all the nodes are in the backbone area.

 

  • Stub, Totally Stub, NSSA and Totally NSSA Areas can create sub optimal routing in the network.Because these are types prevent some information into an area. Whenever there is specific information in the routing table, optimal path can be found , whenever there is summarization (less reachability information in the routing table) suboptimal routing might occur.

 

  • OSPF Areas are used for scalability. If you don’t have valid reason such as 100s of routers, or resource problems on the routers, don’t use multiple areas.

 

  • OSPF Multi area design increases the network complexity. Complexity sometimes is necessary and not the bad thing but just aware that multi area design compare to single/flat OSPF area design is more complex as you need to place ABR in the correct place, dealing with the multi area design related problems such as MPLS Traffic Engineering and MPLS LSP issues. 

 

  • Two is company, three is crowded in design. Having two OSPF ABR provides high availability but three ABR is not a good idea. Unless you have a capacity requirement , I don’t recommend to have three links , nodes , logical entity and so on  in the networks.

 

  • ABR slows down the network convergence. Knowing this important, without ABR in single/flat OSPF design, there is no Type 1, Type 2 to Type 3 LSA generation, similarly Type 4 LSAs also regenerated from the Type 1 LSAs.

 

  • Having separate OSPF area per router is generally considered as bad. You should monitor the routers resources carefully and placed as much routers as you can in one OSPF area.

 

  • Not every router has powerful CPU and Memory, you can split up the router based on their resource availability. Low end devices can be placed in a separate OSPF area and that area type can be changed as Stub, Totally Stub, NSSA or Totally NSSA.

 

  • Always look for the summarization opportunity, but know that summarization can create sub optimal routing. Sub optimal routing may not be a problem for some applications but some applications require very low delay , jitter and packet loss. Sub optimal routing increases a chance of delay (latency).

 

  • Good IP addressing plan is important for OSPF Multi Area design. It allows OSPF summarization (Reachability) thus faster convergence and smaller routing table.

 

  • Having smaller routing table provides easier troubleshooting. Dealing with less information decreases mean time to repair. Identifying the problem and fixing would be faster.  Because there will be less routing prefixes in the routing table and the routing protocol databases so troubleshooting would be much easier and it would be probably manageable by the average skilled engineers.

 

  • Having smaller routing table increases convergence time as well. Summarization reduces the routing table size that’s why provides faster network convergence.

 

  • OSPF NSSA area in general is used at the Internet Edge of the network since on the Internet routers where you don’t need to have all the OSPF LSAs yet still redistribution of selected BGP prefixes are common.

 

  • Topology information is not sent between different OSPF areas, this reduces the flooding domain and allows large scale OSPF deployment. If you have 100s of routers in your network, you can consider splitting the OSPF domain into Multiple OSPF areas. But there are other considerations for Multi Area design and will be explained in this chapter.

 

  • Use passive interface as much as you can. Passive interface should be enabled if you don’t want to setup an OSPF neighborship.

 

  • For very large scale OSPF design, transit subnets can be removed from the OSPF topology. This has been defined in RFC 6860. This feature is known as ‘ prefix suppression ‘ on Cisco routers. Removing these links reduces the routing table size thus increases the network convergence and makes troubleshooting easier.

 

  • If there will be maintenance on the router which runs OSPF , ‘ max-metric router lsa ‘ should be enabled to remove the router from the topology without having packet loss. Actually router still stays in the OSPF topology but since it will advertise maximum metric in Type 1 LSA (Router LSA), traffic is not forwarded to it, if there is an alternate path. If there is no alternate path, even with the ‘ max-metric router lsa ‘ router receives network traffic.
Similar to OSPF best practices , you can find other Best Practices in network design on the website.

— 6 Comments —

  1. Hi Orhan,
    Trying to understand this statement “without ABR in single/flat OSPF design, there is no Type 1, Type 2 to Type 3 LSA generation, similarly Type 4 LSAs also regenerated from the Type 1 LSAs.”
    My understanding is that type 1 and 2 LSAs are flooded between routers sharing a common area, you statements says that if I have a flat ospf area zero deployment there wouldn’t be any Type1/Type 3 to Type 3 LSA generation.

    • Hi, Probably misleading sentence then , I mean here , as there is no ABR in single/flat OSPF design, there is no Type 1 and Type 2 to Type 3 generation. Also there is no Type 1 to Type 4 for the ASBR reachability. Thanks for informing. Orhan

      • Orhan,

        So you meant LSA transition between areas and not generation.
        As the type 1 and type 2 lsa generation will be there for sure ( type 2 depends on network type though).

        Thanks,
        Mayank

  2. Hi Orhan

    Truly, It’s really usefull and priceless post specially from design point of view. Thank you for sharing.

    I would like to ask that I confused at one point in this post. Perhaps not anly about ospf.

    You said that “Dealing with less information increases mean time to repair.”
    What do you mean exactly when you say “less information” ?
    Knowladge of people who responsible the network
    or
    Routers which has routing information ?

    I thought that dealing with less information for the routers “decreases” mean time to repair. I do not know maybe I am wrong.
    Thank you once again.

    • Hi Fatih, Thanks for raising this point as it seems it required more explanation. I added extra information to that statement. Please check again. Yes it is about the people who is doing the troubleshooting , if the table size is small (maybe just a default route) even the average skilled engineer can manage it and since the routing table small , identifying the problems would be much faster.

      When you have less number of prefixes it helps for the convergence as well since the prefixes can be placed in the dataplane after the failure much faster. This directly effects mean time to repair.

      Last but not least, you were not wrong. It decreases the mean time to repair, I made a typo mistake and corrected it. Cheers Orhan

Leave a Reply

Your email address will not be published.