Data Center

Total 4 Blogs

Created by - Orhan Ergun

RIFT - Routing in Fat Trees

RIFT (Routing In Fat Trees) is a routing protocol that is designed for use in data center networks that use a fat tree topology. Fat tree is a network topology commonly used in modern data centers, which is characterized by its ability to provide a high degree of network bandwidth and low-latency communication between servers. In a fat tree topology, the network is organized in a hierarchical structure with multiple layers of switches. At the top layer of the tree, there are one or more core switches that interconnect the lower layers of the tree. The middle layer of the tree is composed of aggregation switches, which interconnect multiple access switches at the bottom layer. Each access switch connects to multiple servers. RIFT is designed to address the challenges of routing in fat tree networks, which include the need for efficient use of network bandwidth, low latency, and high network availability. RIFT achieves this by using a combination of centralized and distributed routing algorithms. In RIFT, the core switches in the top layer of the tree are responsible for computing and distributing routing information to the other switches in the network. Each switch in the network maintains a routing table that specifies the best path to each destination. RIFT also employs a technique called equal-cost multi-path (ECMP) routing, which allows multiple paths to be used for traffic between any two switches in the network. This helps to distribute traffic across the network and provides better network performance. Overall, RIFT is a scalable and efficient routing protocol that is well-suited for use in modern data center networks that use a fat tree topology. RIFT vs. BGP  RIFT and BGP (Border Gateway Protocol) are both routing protocols used in data center networks, but they differ in their design goals and capabilities. BGP is a widely used routing protocol that is commonly used in large-scale service provider networks and enterprise networks. It is designed for routing between different autonomous systems (AS) and exchanging routing information between different networks. In contrast, RIFT is specifically designed for use in data center networks that use a fat tree topology. RIFT is optimized for high-bandwidth, low-latency communication within the data center network, and it can provide efficient and scalable routing in large data centers. one of the important differences is that BGP is a more flexible protocol that can handle a wider range of routing scenarios, such as interconnecting different networks and providing transit services. In contrast, RIFT is designed specifically for fat tree topologies and may not be as suitable for other types of network topologies or routing scenarios. In summary, while both RIFT and BGP are routing protocols used in data center networks, they have different design goals and capabilities. RIFT is optimized for high-bandwidth, low-latency communication within fat tree data center networks, while BGP is more flexible and suitable for a wider range of routing scenarios.

Published - 3 Days Ago

Created by - Orhan Ergun

Edge Computing and Fog Computing

In this post, Edge Computing and Fog Computing will be explained. Network engineers, network designers, network architects need to know these two important industry terms and their architecture to decide "how" and "where" to deploy workload, where to store data. and many important architectural decisions. I will start explaining Edge Computing and Fog Computing and after that we will compare them, we will try to understand the architectural differences. Let's start with Edge Computing.   Edge Computing:   Edge computing is a networking philosophy focused on bringing computing as close to the source of data as possible, in order to reduce latency and bandwidth usage. In a simpler term, edge computing means running fewer processes in the cloud and moving those processes to local places, such as on a user’s computer, an IoT device, or an edge server. Bringing computation to the network’s edge minimizes the amount of long-distance communication that has to happen between a client and server. It is important to understand that the edge of the network is geographically close to the device, unlike origin servers and cloud servers, which can be very far from the devices they communicate with. Cloud computing offers significant amount of resources (e.g., processing, memory and storage resources) for the computation requirement of mobile applications. However, gathering all the computation resources in a distant cloud environment started to cause issues for applications that are latency sensitive and bandwidth hungry.   Akamai, CloudFront, CloudFlare and many other Edge Computing Providers provide edge services like WAF, Edge Applications, Serverless Computing, DDos Protection, Edge Firewall etc.   Fog Computing:   Both the Fog and Edge Computing are totally concerned and looks into the computing capabilities to be done locally rather than pushing it to the Cloud.Overall reason of having Fog computing is to reduce delay and bandwidth requirement from the network Most of the Fog Computing use cases came from the IOT deployments. Industrial Automation, Intelligent Transportation, Smart Grid etc. Edge Computing is heavily discussed with 5G. For the real-time applications , having computing resources closer to the source provides faster processing. Main difference between Fog Computing and Edge Computing is at where the data processing takes place. Figure - Edge vs. Fog Computing 

Published - Tue, 03 Mar 2020

Created by - Orhan Ergun

Edge Computing Providers

Edge computing is a networking philosophy focused on bringing computing as close to the source of data as possible, in order to reduce latency and bandwidth usage. In a simpler term, edge computing means running fewer processes in the cloud and moving those processes to local places, such as on a user's computer, an IoT device, or an edge server. This post was first published in Service Provider Networks Design and Architecture book. Bringing computation to the network's edge minimizes the amount of long-distance communication that has to happen between a client and server. For Internet devices, the network edge is where the device, or the local network containing the device, communicates with the Internet. The edge may not be a clear term; for example, a user's computer or the processor inside of an IoT camera can be considered the network edge, while the user's router, ISP, or local edge servers are also considered the edge. It is important to understand that the edge of the network is geographically close to the device, unlike origin servers and cloud servers, which can be very far from the devices they communicate with. Cloud computing offers significant amount of resources (e.g., processing, memory and storage resources) for the computation requirement of mobile applications. However, gathering all the computation resources in a distant cloud environment started to cause issues for applications that are latency sensitive and bandwidth hungry. The underlying reason is that network traffic has to travel through several routers managed by Internet Service Providers (ISPs), operating at varying tiers. All these routers significantly increase the Round-Trip Time (RTT) that latency-sensitive applications face. In addition to this, end-to-end routing path delays can change very dynamically due to ISPs and network conditions. Akamai, CloudFront, CloudFlare and many other Edge Computing Providers provide edge services like WAF, Edge Applications, Serverless Computing, DDos Protection, Edge Firewall etc. Figure Applications of Cloud and Edge Computing In the above figure, common use cases of Cloud and Edge Computing Services are shown. Many emerging technologies will require Edge computing.

Published - Mon, 20 Jan 2020

Created by - Orhan Ergun

High Availability inside the Datacenter

High Availability inside the Datacenter: In Leaf/Spine VXLAN based data centers, everyone likes to provide HA with Active/Active in it, so choices are different. There are two types of HA in data centers, Layer 3 and Layer 2. For layer 3 HA, always there is more than one spine that can provide ECMP and HA at same time. However, Layer 2 redundancy for hosts and l4-l7 services that connected to leafs are more than an easy choice. As Cisco provided vPC for nearly 10 years ago, almost this was the first (and only) choice of network engineers. Also, other vendors have their own technologies. For example, Arista provided Multi-chassis Link Aggregation (MLAG) for L2 HA in leafs. But, there is always a problem in implementation of them. One example in vPC is “peer-link” that is an important component in the vPC feature. However, it can be a tough one in most cases like Dynamic Layer-3 routing over vPC or Orphan members that may cause local traffic switching between vPC peers without using Fabric links.   To address the “peer-link” issue, there is a “fabric-peering” solution that uses Fabric links instead of “peer-link” and convert it to “virtual peer-link”. With this solution there is no concern about local switching in specific cases.This solution works better, but it cannot solve Dynamic Layer-3 routing over vPC or other issues (PIP, VIP, virtual-rmac) without enhancements. There is another HA solution that you can find it below.   EVPN MultihomingWith introducing EVPN, EVPN Multihoming is a solution to bring HA in layer 2 links without any “peer-link” like dependency. EVPN uses Ethernet Segment Identifier (ESI) in Ethernet Auto-discovery (EAD) or route type 1 with 10 bytes value. ESI is configured under bundled link that sharing with multihomed neighbor. ESI can be manually configured or auto-derived. Also, to prevent loop scenarios because of packet duplication in multihoming scenarios, Ethernet Segment Route (ESR) is the other route type (4) that mainly uses for designated forwarder (DF) election. DF is election for a forwarder that handles BUM traffic. Also, Split-horizon feature enables only remote BUM traffic allowed to be forwarded to a local site and BUM traffic from same ESI group will be drop. With using EVPN multihoming, traffic will be balance between both leafs because they are advertising a shared system MAC with same ESI. This feature is called “Aliasing”. In failure scenarios, with “Mass Withdrawal” fast convergence will remove failed leaf from ECMP path list. Also, LACP can be turned on to prevent ESI misconfigurations. Designated Forwarder MAC Aliasing MAC Mass Withdrawal Regardless of which HA solution is better, there are some differences between HA solutions that some of them are listed below on Cisco technologies. You must keep in mind that this is Data Center comparison. Because vPC does not supported on Cisco routers but EVPN does.     vPC/vPC2 EVPN Multihoming Hardware All Nexus platforms Nexus 9300 Only (until now) FEX supported Yes No Same OS version Yes Not mentioned Multiple components Yes No QoS needed For vPC2(fabric peering) No ISSU Yes No Maximum peers 2 2+   Load BalancingvPC is using regular port-channel load balancing methods. On the other hand, EVPN provides modern load balancing. There is three EVPN load balancing method available: 1-Single Active 2- All active 3- Port Active. Assume two leafs are connected to one host, but only one of them we are considering to be active for a service. So, related MAC address is reachable via only one of leafs and it refers to per-VLAN (service) load balancing too. This method is suitable for billing some services or policing on specific traffic.In All active mode, each leaf can forward traffic. In this mode, load balancing method is per-flow. This is regular load balancing mechanism that share traffic between leafs as below figure. This method is better for providing more bandwidth based on end hosts.Port Active is a mechanism that brings Active-Standby scenario to EVPN multi-homing and only one leaf forwards traffic. However, in failure scenarios with fast convergence, traffic will switch to standby leaf. This method is your choice when you want to force traffic on a specific link that is cheaper or you want to use only one link.This is important that every EVPN feature in this post is not implemented on all platforms and vendors. To recapitulate, both solutions have cons and pros. Depend on Data Center design and requirement, you can choose one of solutions. Keep in mind that you can not enable both feature on a switch at a same time. Also, LACP is an additional tool to improve these features functionality and it helps avoiding misconfigurations.   note: all figures are taken from ciscolive presentations

Published - Tue, 17 Dec 2019