Hello everyone!
Let’s continue our discussions about Site Resiliency model offered by NSX not through Multisite but through federation. What benefits we may have and what improvements we may have. This is all what we are going to discuss here in this topic.
NSX Federation (a Quick Brief)
Unlike Multisite NSX architecture, NSX Federation does not require to configure MTU over WAN or at provider side to be changed from typical (Default) value to 1700+. It’s a big change at infrastructure configuration or requirement level.
NSX Managers can be on different geographical locations despite of 10ms RTT problem. Because, the global objects will penetrate into local NSX Managers through Async Replicator Service through Application Proxy Hub (APH) offered by Global NSX Managers. It only replicates Clusters with other site clusters not amongst the nodes of one cluster (or inside a cluster).
Whereas the distance in between Global managers (within same NSX Manager Cluster) (Active / Standby instances) must not go beyond 10ms but NSX Managers Active and Standby instances can have upto 500ms RTT. As show in the figures below respectively for each scenario. Below Scenario is NSX Global Manager Stretch architecture (Active Global Manager Cluster only).
Fig 1.1
Figure above shows the distance in mili-second time amongst the nsx manager instances.
And below figure shows the cluster to cluster “Async Replicator” activity to synchronise and help assisting stretch architecture used in federation.
Fig 1.2
Another major benefits you can have using federation is, you don’t need to configure bigger MTU (as required for VxLAN configs). It always go as default but within site (Local not stretched) , yes you need to configure the same MTU 9000+.
Even in between Edges across sites, these Edges are known as RTEPs to each other. These can further chunk down the MTU more less than 1500. So, below picture is going to explain a high level overview of a federation
Fig 1.3
There are many different options or scenarios that we can explore to design a solution for NSX Federation. Below are some that I am going to explain in a bit details
Federation with Stretch Active Global NSX Manager Cluster
This scenario is useful and feasible only in cases where sites / regions are not so far but fall under the distance of 1ms to 10ms (or upto 150ms only incase of NSX 4).
In such scenario, you can build Global Manager cluster (Active only) with each GM instance in each site making it Active per site with GSLB integration and LM (local NSX Managers) also in the same topological model as explained in Fig 1.1.
It doesn’t require additional vCenter server per-site and only one vCenter Server is sufficient in this scenario [Ref: NSX Design Guide 4.1 v.1.3 – Multisite page 27]. Best suitable for Metropolitan network-based scenarios or intera-city Branch/Data-center Availability zones. Even there is no need for vCenter server ELM.