Networking

VXLAN EVPN Explained: How Modern Data Center Fabrics Actually Work

By Travis · May 12, 2025 · 9 min read

#vxlan #evpn #bgp #data-center #networking #cisco #nx-os

Why VXLAN EVPN Exists

Traditional data center networks ran on a three-tier hierarchical model: access, aggregation, and core layers connected by Spanning Tree Protocol (STP) to prevent loops. STP worked, but it came with significant tradeoffs — blocked redundant links, slow convergence, and hard limits on scale. The 12-bit VLAN tag imposed a ceiling of 4,094 unique network segments, a number that large multi-tenant environments exhaust quickly.

As data centers scaled, workloads became virtualized, and applications needed to move freely across physical infrastructure, these limitations became operational blockers. The industry needed an overlay technology that could extend Layer 2 domains across Layer 3 infrastructure, scale to millions of segments, and operate across geographically distributed sites.

VXLAN EVPN is the answer that emerged and became the industry standard — used by Cisco, Arista, Juniper, Nokia, and virtually every major data center networking vendor.

Two Distinct Layers: Underlay and Overlay

Understanding VXLAN EVPN starts with grasping the fundamental separation between two independent networks operating simultaneously:

The Underlay

The underlay is the physical IP network connecting all the switches. Its only job is to provide IP reachability between VTEP loopback addresses — the logical endpoints that terminate VXLAN tunnels. The underlay doesn’t know anything about tenant traffic, VLANs, or virtual machines. It just routes IP packets.

The underlay typically runs one of three routing protocols:

IS-IS — Cisco’s recommended underlay protocol. Link-state, fast convergence, no route redistribution complexity.
OSPF — Common alternative, also link-state, well understood by most teams.
eBGP — Used in some hyperscaler-style designs (see RFC 7938), though Cisco’s best practice separates underlay (IGP) and overlay (BGP) functions to avoid overloading a single protocol.

The underlay only needs to advertise loopback addresses — it doesn’t carry tenant routes, MAC addresses, or any application-layer information. This simplicity is intentional.

The Overlay

The overlay is the virtual network that tenant traffic actually traverses. It’s built on top of the underlay using VXLAN encapsulation — wrapping Ethernet frames inside UDP packets and routing them across the underlay IP network.

The overlay carries:

Layer 2 tenant traffic (MAC-to-MAC forwarding within the same subnet)
Layer 3 tenant traffic (IP routing between subnets within the same VRF)
Multi-tenancy (multiple isolated VRFs sharing the same physical infrastructure)

VXLAN: The Data Plane

VXLAN (Virtual Extensible LAN) is defined in RFC 7348. It solves the VLAN scalability problem by replacing the 12-bit VLAN tag with a 24-bit VXLAN Network Identifier (VNI), enabling up to 16 million unique network segments.

How Encapsulation Works

When a host sends a frame, the leaf switch it’s connected to (acting as a VTEP) examines the destination MAC address. If the destination is on a remote VTEP, the leaf encapsulates the original Ethernet frame by adding:

[ Outer Ethernet Header ]
[ Outer IP Header       ]  ← Source = local VTEP loopback
[ UDP Header            ]  ← Destination port 4789
[ VXLAN Header          ]  ← Contains the 24-bit VNI
[ Original Ethernet Frame ]
[ Original Payload      ]

The encapsulated packet travels across the underlay as a normal IP packet. When it reaches the destination VTEP (another leaf switch), that VTEP strips the VXLAN headers and delivers the original Ethernet frame to the destination host.

From the host’s perspective, it’s communicating on a normal Ethernet segment. The encapsulation and decapsulation are completely transparent.

The MTU Implication

VXLAN encapsulation adds 50 bytes of overhead (outer Ethernet 14B + outer IP 20B + UDP 8B + VXLAN 8B). If hosts send standard 1500-byte frames, the underlay needs to accommodate 1550-byte packets. The standard requirement is MTU ≥ 1600 bytes on all underlay-facing interfaces and an MTU of 9216 bytes is recommended for jumbo frame support in the underlay.

VTEPs: The VXLAN Endpoints

VTEP (VXLAN Tunnel Endpoint) is the term for any device that performs VXLAN encapsulation and decapsulation. In a data center fabric, leaf switches are the primary VTEPs — they sit at the edge of the fabric where hosts connect.

Key VTEP functions:

Encapsulate outbound tenant traffic in VXLAN headers
Decapsulate inbound VXLAN packets and deliver to local hosts
Maintain a mapping of remote MAC/IP addresses to remote VTEP loopback addresses
Participate in the BGP EVPN control plane to learn and advertise reachability information

On Cisco NX-OS, the VTEP is implemented as an NVE (Network Virtualization Edge) interface:

interface nve1
  no shutdown
  host-reachability protocol bgp
  source-interface loopback0
  member vni 10010
    ingress-replication protocol bgp
  member vni 50000 associate-vrf

The source-interface loopback0 is the VTEP’s IP address — the address other VTEPs will use as the destination when encapsulating VXLAN packets destined for this switch.

The Problem with Flood and Learn

Early VXLAN deployments (RFC 7348) used a flood-and-learn model for VTEP discovery and MAC address learning. When a VTEP didn’t know where a destination MAC lived, it flooded the packet to all other VTEPs in the fabric. Remote VTEPs learned source MAC-to-VTEP mappings from the incoming flooded traffic.

This created two significant problems:

Multicast dependency. Flooding required multicast in the underlay to replicate BUM (Broadcast, Unknown unicast, Multicast) traffic to all VTEPs. Not all organizations want to run multicast in their data centers.

Scale limitations. In large fabrics with thousands of hosts, the flood-and-learn behavior generated substantial BUM traffic that consumed bandwidth and CPU resources on every VTEP in the fabric.

BGP EVPN solves both problems.

BGP EVPN: The Control Plane

MP-BGP EVPN (Multiprotocol BGP Ethernet VPN) is the standards-based control plane for VXLAN, defined in RFC 7432 and RFC 8365. Instead of learning MAC and IP addresses through data-plane flooding, VTEPs advertise them proactively through BGP.

When a host connects to a leaf switch:

The leaf learns the host’s MAC and IP address
The leaf advertises this information as a BGP EVPN route to its BGP peers (typically spine route reflectors)
Spine route reflectors distribute the advertisement to all other leaf VTEPs
Remote VTEPs install the mapping: host MAC/IP → remote VTEP loopback address

When traffic needs to reach that host, the sending VTEP already knows exactly which VTEP to send it to — no flooding required.

BGP Peering Structure

The standard Cisco design uses iBGP with route reflectors:

Spines = BGP Route Reflectors (RR)
Leaves = BGP Route Reflector Clients

Each leaf peers with both spines via iBGP in the l2vpn evpn address family. Spines reflect EVPN routes between leaves. Leaves never peer directly with each other (no full mesh required).

! Leaf BGP configuration
router bgp 65000
  router-id 10.0.0.1
  address-family l2vpn evpn
  template peer SPINE
    remote-as 65000
    update-source loopback0
    address-family l2vpn evpn
      send-community extended
      soft-reconfiguration inbound
  neighbor 10.0.0.101 inherit peer SPINE   ! Spine 1
  neighbor 10.0.0.102 inherit peer SPINE   ! Spine 2

ARP Suppression: Eliminating Broadcast at Scale

One of the most impactful features BGP EVPN enables is ARP suppression. When BGP EVPN distributes MAC/IP bindings for every host, VTEPs can answer ARP requests locally rather than flooding them across the fabric.

When Host A sends an ARP request for Host B’s MAC address:

Without ARP suppression: the ARP floods across the fabric to every VTEP
With ARP suppression: the local VTEP checks its BGP EVPN table, already has Host B’s MAC (learned from BGP), and responds locally

This dramatically reduces BUM traffic in the fabric — particularly important at scale where thousands of hosts are constantly sending ARP requests.

! Enable ARP suppression on a VNI
evpn
  vni 10010 l2
    arp-suppression

Symmetric IRB: Layer 3 Routing Across the Fabric

IRB (Integrated Routing and Bridging) is how VXLAN EVPN handles Layer 3 routing between subnets. Cisco NX-OS implements symmetric IRB as its standard model.

In symmetric IRB, routing occurs at both the ingress and egress VTEPs:

Host A (10.1.1.10 in VNI 10010) sends traffic to Host B (10.1.2.10 in VNI 10020)
Ingress VTEP (Host A’s leaf) routes the packet from VNI 10010 to the L3 VNI (VRF VXLAN encapsulation)
The packet traverses the fabric encapsulated with the L3 VNI (not the L2 VNI)
Egress VTEP (Host B’s leaf) routes the packet from the L3 VNI into VNI 10020 and delivers to Host B

The key is the L3 VNI — a dedicated VNI assigned per VRF for inter-subnet routing. Every leaf that participates in a given VRF uses the same L3 VNI, enabling routed traffic to traverse the fabric without requiring centralized routing.

vrf context TENANT-A
  vni 50000           ! L3 VNI for this VRF
  rd auto
  address-family ipv4 unicast
    route-target import evpn 65000:50000
    route-target export evpn 65000:50000

Distributed Anycast Gateway: Default Gateway at Every Leaf

In a traditional network, hosts point to a default gateway IP address. If that gateway is centralized (a core router or firewall), all inter-subnet traffic must traverse the core — a significant bottleneck.

Distributed Anycast Gateway solves this. Every leaf switch in the fabric is configured with the same MAC address and the same IP address for the default gateway of each subnet. Hosts in the same subnet always send default gateway traffic to their local leaf — no matter which leaf they’re connected to, they hit the same virtual MAC/IP.

fabric forwarding anycast-gateway-mac 0000.2222.3333

interface Vlan10
  no shutdown
  vrf member TENANT-A
  ip address 10.1.1.1/24
  fabric forwarding mode anycast-gateway

This eliminates the need for HSRP or VRRP, removes centralized routing bottlenecks, and enables workload mobility — a host can move from one leaf to another and its default gateway IP/MAC stays the same.

Multi-Tenancy with VRFs

VXLAN EVPN supports full multi-tenancy through VRFs (Virtual Routing and Forwarding instances). Each tenant gets an isolated Layer 3 routing domain:

Each VRF maps to a unique L3 VNI
Each VRF has its own route-target import/export policies that control which routes are shared between VTEPs
Traffic between VRFs (inter-tenant) must exit the fabric through a designated external gateway — isolation is enforced at the fabric level

This is how a single physical fabric can simultaneously host multiple customers, business units, or security zones with complete routing isolation between them.

The Leaf-Spine Topology

VXLAN BGP EVPN is almost always deployed in a leaf-spine topology rather than a traditional three-tier hierarchy:

        [Spine 1]  [Spine 2]
       /    |    \/    |    \
      /     |    /\    |     \
 [Leaf1] [Leaf2] [Leaf3] [Leaf4]
    |        |       |       |
 Servers  Servers Servers Servers

Why leaf-spine?

Every leaf-to-leaf path is exactly two hops (leaf → spine → leaf), providing equal-cost, predictable latency
Spines have no host connections — only uplinks to leaves and route-reflector functions
Adding capacity means adding leaf-spine pairs, with no changes to existing infrastructure
ECMP across multiple spines provides load balancing and redundancy without STP

Putting It All Together: A Packet Walk

Here’s what happens when Host A (on Leaf 1) sends traffic to Host B (on Leaf 3):

BGP EVPN pre-work: Leaf 3 has advertised Host B’s MAC (aabb.cc00.0200) and IP (10.1.2.10) via BGP. Leaf 1 has received and installed this in its EVPN table: Host B → VTEP 10.0.0.3 (Leaf 3’s loopback).
Host A sends a frame destined for 10.1.2.10.
Leaf 1 (ingress VTEP) looks up 10.1.2.10 in its EVPN table. It knows this is in a different subnet (different VNI), so it routes it. It encapsulates the packet with a VXLAN header containing the L3 VNI for the tenant VRF, sets the outer IP destination to Leaf 3’s loopback (10.0.0.3), and sends it into the underlay.
Spine switches route the outer IP packet normally — they see it as a UDP packet to 10.0.0.3 and forward it via ECMP.
Leaf 3 (egress VTEP) receives the VXLAN packet, strips the encapsulation, identifies the L3 VNI, maps it to the tenant VRF, looks up 10.1.2.10 in the local VRF routing table, and forwards the original packet to Host B.

From Host A and Host B’s perspective, they’re on the same Ethernet segment. The VXLAN fabric is completely transparent.

Why This Matters for Nexus Dashboard

NDFC (Nexus Dashboard Fabric Controller) automates the provisioning of every component described here — underlay IGP, iBGP EVPN overlay, VNI assignments, route-target configuration, anycast gateway MAC, ARP suppression, and more — through its Easy Fabric workflow. Understanding what NDFC generates underneath is essential for troubleshooting, scaling, and customizing deployments beyond what the automation templates provide.

The fabric NDFC builds is the same VXLAN BGP EVPN fabric described in this post — just deployed consistently and at speed, without manual configuration errors.