Welcome back to the deep dive. We are really getting into the weeds today. Oh yeah, yeah, we're tackling vxland BGP EVPN fabrics. I mean, this is the stuff running modern data centers, the real backbone.
It definitely is. And you know, it's a huge shift from how things used to be done totally.
We're talking about moving away from those older kind of clunky three tier designs.
Right exactly. This is the fast track guide basically to understanding that jump to well high speed, really scalable software defined networks.
So what problem does VXLANT actually solve? Why do we need this shift?
Well, the old way just hit a wall. Traditional campus designs, heavy on layer two spanning tree, they just couldn't scale, not for the kind of density and speed modern applications demand.
Yeah, you think about virtualization, multi gig traffic everywhere. The old models just choked, didn't.
They They really did. You need a predictable performance, active paths everywhere, not just stand by links, and that's what this fabric is built for.
Okay, So let's unpack this, starting right the bottom, the physical setup, the spine and leaf architecture.
Right. The keyword here is symmetry. Every leaf switch connects to every spine switch.
Bull mesh between layers basically exactly.
And what that do you is a super predictable traffic pattern. Any device connected to a leaf is always just two hawks away from any other device source leaf.
To a spine to the destination leave always always.
That predictability is golden for performance.
Makes sense, Yeah, So let's define those roles a bit more clearly. The spine layer it connects the leaves, aggregates traffic.
It connects the leaves, yes, but it's doing much more than just layer two aggregation. Now, in these layer three fabrics, the spines are pure routers. They are the high speed core and crucially they act as the BGP EVPN route reflectors.
AH, reflecting the roots down to the.
Leaves precisely, and they often handle multicasts two acting as rendezvous points or oreps for the underlay network and.
The leaf layer. This is where things really change. The traditional core functions move down here.
That's the fundamental shift. The leaves are where your servers, your device is connect but they're also making the main routing decisions. Now. They are the layer three cores distributed across the whole fabric.
So Instead of one big core, you have lots of smaller active.
Course exactly, all working together.
Okay, quick question. Then, if the spines are central for commutation like those route reflectors, what happens if one fails? Doesn't that break things?
That's where the design shines. You always have at least two spines, since every leaf connects to all of them. If one spine goes down, traffic just instantly moves over the links to the other active spine or spines.
Okay, built in redundancy totally.
Built in, and you get redundancy at the leaf level too for servers connecting to multiple leaves using things like VPC. The physical resilience is just inherent.
Okay, physical structure makes sense. Now, This is where my brain starts to hurt a little. The underlay and the overlay.
Ha. Yeah, let's use that roller coaster analogy. It actually works pretty well.
Okay, hit me with it.
So the underlay, think of it as the physical roller coaster track, the motors, the breaks. It's the foundation the physical links between spines and leaves exactly. Its only job is basic IP reachability, making sure all the key interfaces on the leaves and spines can talk to each other. It usually runs a simple routing protocol like OSPF or isis just to.
Build that basic connectivity.
Map, right, And because of that spine and leaf connection pattern, we get to use equal cost multipath routing ECMP. All those links between a leaf and the multiple spines, they're all active, all forwarding traffic at the same time.
So it's like layer three link aggregation, super fast.
Super fast, and super resilient. That's the underlay, the solid fast foundation.
Okay, foundation lay. Now the overlay the roller coaster cars and the riders.
That's it. This is where the vxcellent magic happens. The overlay runs on top of that physical underlay. Vxcel An takes your normal layer two frame like an Ethernet frame from a server, and wraps it up inside a layer three UDP packet. It tunnels L two over L three.
Okay, and it uses BGP EVPN to manage all this exactly.
EVPN is the control plane for this overlay. But the real game changer vx land brings is multi tenancy AH.
Separating different customers or departments. So in the analogy, each roller coaster car is a tenant like a VRF.
Perfect Each car is a VRS a separate routing world, and the riders in that car are the vlands belonging to that tenant.
And riders in different cars can't just talk to each other.
Nope, completely isolated by default unless you specifically build bridges can figure route leaking between them.
Okay, that isolation is huge now practical point. Yeah that wrapping, that encapsulation adds overhead, right, yeah.
Big time, about fifty bytes or so. So so you absolutely must enable jembo frames on all those underlay links. And to you nine thousand maybe.
Higher, right, because if you don't, what happens.
Fragmentation city small pings might work, but try moving a real file. Packets get chopped up performance tanks. It's like the number one deployment.
Gotcha, good tip, don't forget the mtumm. Okay, So moving up to the control plane. This is really the heart of EVPN, isn't it. Shifting from data plane learning.
Like flooding and praying with spanning tree.
Yeah, that mess, shifting that intelligence to the control plane with BGPEVPN. What's the big win there? Operationally?
Oh, operationally it's night and day. Think about troubleshooting layer two loops before it.
Was awful, tell me about it.
With EVPN, the local leaf sees m address, learns its IP and bang, it advertises that MSEP pair as a BGP route to.
The route reflectors the spines right.
The spines reflected to other leaves that need it. No more network wide flooding to learn max CS.
So finding where a device is becomes a routing lookup, not a frantic MP table search across dozens.
Of switches exactly. Troubleshooting L two problems becomes essentially L three troubleshooting, much much easier.
Okay, so we've killed most broadcast issues, but what about necessary evils like you know, ARP broadcast, unknown unicast, multicast the bum traffic.
Right, you still need some way to handle that. VX lane uses multicast in the underlay for this, Basically bum traffic for a specific VX line segment of V and I gets mapped to a specific multicast group in the underlay, and.
The spines act as the rps for those multicast groups.
Usually, yeah, you'd configure the spines as the rendezvous points, and you'd want redundancy there too, using something like any cast RP. So both spines can handle it actively makes sense.
Let's nail down a couple of interface terms, NVE and VTT. They sound similar, they work together.
The NVE Network Virtual Interface is kind of the logical engine on the leaf switch that does the actual vx land encapsulation and decapsulation.
Thing, doing the wrapping and n wrapping right.
And the VTT Virtual Tunnel end point is the IP address used as the source and destination for those vxland tunnels.
And that VTTPIP. It's usually a special loopback interface.
Always typically loop back one, and this is important. It's different from the loopback you might use for the underlay BGP peering, which is often loopback zero.
Why the separation why two loopbacks stability?
Mainly loopback zero is for the underlay riding protocol itself rock solid reachability loop back one. The VTT address is what the overlay tunnels use. Keeping them separate means if something weird happens with your BGP peering, it doesn't necessarily break your established vx land tunnels. It isolates the planes.
H good design practice. Okay, and we mentioned the spines are rote reflekers. Why is that so vital for scaling?
Imagine if they weren't, every leaf switch would need a direct BGP peering with every other leaf switch.
The full mesh Nightmare.
Total nightmare ten leaves forty five BGP sessions, twenty leaves, one hundred and ninety sessions with route reflectors on the spines. You add a new leaf, it just peers with the say two spines, two new sessions.
That's it. That makes adding capacity way.
Way easier, massively easier. It's built for scale.
All right. Let's dig into that multi tenancy piece more separation and routing between tenants. You mentioned. Roade targets are ts.
Yeah. Root targets are basically tags we attached to the EVPN routes. Think of them like labels. They usually look like as number, dot, I D maybe six five five five zero one point one zero zero one or something.
And these tags control who gets witch routes exactly.
Each VRF each tenant has specific root targets. It exports tags its routes with and imports accepts routes tagged with. It's the BGP way of controlling visibility between virtual networks.
Okay, and we have two kinds of v and ies involved, L two v and I and L three V and I back to the roller custom let's do it.
The L two V and I. Think of that as the identifier for a specific group of riders within one car. It maps directly to a traditional VLAN So.
VLAN ten might become L two v and I one zero ten something like that.
Yeah, it carries that layer two traffic across the fabric and the big plus you can have over sixteen million v and ees, way past the old four thousand VLAN.
Limit, huge scale increase. Okay, So that's L two V and I for L two traffic within a tenant. What about the L three V and I.
The L three V and I is different. It's not for carrying VLAN traffic directly. It's the dedicated routing interface for the entire VRF, the whole tenant car.
Okay, so what's its job?
Its only job is Layer three routing interviewland routing for devices within that tenant, especially when they live on different leaf switches.
Ah. So if a server on leaf one in VLAN ten needs to talk to a server on Leaf five in VLAN twenty, but they're in the same tenet VRF.
The traffic goes from the source server to its leaf. Leaf one gets routed using the shared gateway, encapsulated using the L three V and I tunnel, sent across the underlay to Leaf five, decapsulated, and then routed to the destination server in Vland twenty.
Got it. The L three V and I is the dedicated interview land highway for that tenant across the fabric.
Perfect analogy, and.
This ties into that fabric any cast gateway feature right, making every leaf an active router.
Absolutely Forget old HSRP or VA, where one router was active and the other just sat there waiting to fail.
Over wasting half your capacity.
Right with any cast gateway, all the leaf switches share the exact same virtual MAC address and the same default gateway IP address for a given v Land.
So a server just sends traffic to its gateway.
IP and whichever leaf receives it can immediately route it, no tromboning traffic to a specific active core switch. Every leaf is an active Layer three core for the v lands it serves. It distributes the routing load beautifully.
Very cool. Okay, final big section, getting traffic in and out of this fancy fabric external connectivity. We need a border leaf for that YEP.
One or more leaves need to be designated as border leafs. They're the ones physically connected to the outside world, the wan, maybe a firewall cluster, the rest of the campus network.
And they need some extra configuration.
Obviously for sure, because now you're bridging the BGPEVPN world with you know, potentially OSPF static routes whatever is running externally.
So when that border leaf learns an external route, say from OSTF, how does it tell the rest of the fabric about it.
Inside EVPN, it advertises those external prefixes as EVPN Type five routes. That's the key type for external.
Reachability, okay, type five. And what is that rout Perry.
It carries the external prefix like your WHAN subnet. And critically, the next hop for that route is set to the vtep ip address of the border leaf itself.
Ah. So if leaf seven needs to send traffic to the one it.
Sees the type five route sees, the next hop is the border leaf's VTEP encapsulates the packet using the L three, V and I tunnel pointed at the border leaf and fires it off slick.
Okay, But now the tricky part integrating OSPF for something externally. Isn't there a big risk of asymmetric routing?
Huge risk. This is probably the second biggest deployment headache after MTU.
Traffic goes out border leaf A tries to come back in be a border leaf B, and the firewall freaks out because it didn't see the outbound.
Flow exactly that scenario state mismatch, broken connections. So the goal has to be symmetric routing.
How do you force that?
You have to play with the routing protocol metrics or administrative distances. You need to make sure the path back into the fabric from the external network prefers the same border leaf the traffic went out on.
So maybe making the EVPN route learn via IBGP more preferred than the OSPF route.
That's a very common way. Default IBGP eight is two hundred, but you can lower it, say below OSPFS one ten. That usually ensures return traffic prefers the EVPN path back through the correct border leaf. You got to make sure forward and return paths match.
Gotcha last detail, the default route getting zero points zero point zero zero zero advertised across the fabric, so everything can reach the Internet, presumably via the border leaf you mentioned needing two BGP commands for this.
Yeah, people trip over this. Sometimes you need default information originate under the BGP VRF canfig that tells BGP, hey, I want to advertise a default route.
Okay, step one.
But BGP won't advertise a route unless it's actually in its BGP table. So you also need the network point zero point zero point zero or a zero zero command in that same VRF address family configuration.
To actually inject the route into BGP so it can be originated exactly.
You need both default information originate enables it, networks point zero point zero point zero zero zero, provides the route itself, then the border LAFE advertises it as a Type five and the whole fabric knows how to get out.
Makes perfect sense. Wow, Okay, we covered a lot of ground, we really did, but the summary feels clear now. VX land BGPEDPN. It smashes the old vland limits from massive scale, gives you amazing performance with ECMP, and crucially uses that smart control plane to dish the nightmare of layer two loops and flooding.
Yeah, the big picture is this blend of layer two VPN technology running over a solid layer three foundation. It's all about flexibility, speed and making network deployments, especially using templates, much faster and more reliable.
And for you listening, if you want to go even deeper. The source material mentioned things like multipod and multi site, extending this fabric idea across different data centers.
Right, making multiple physical sites look like one giant logical fabric multipod more connecting separate fabrics together multi.
Site, which leads to our final thought.
If you're stretching one logical fabric across multiple physical locations, maybe even miles apart, what's the absolute most critical routing principle you have to maintain everywhere across all sites to keep things working smoothly.
It's not just about can packets get there?
No, it comes back to symmetry and filtering, ensuring traffic flows follow the same path out and back globally and making sure your route filtering is consistent everywhere. That becomes absolutely paramount. Get that wrong in a multi site setup and you're in for a world of pain.
Something to definitely keep in mind. Okay, that was a fantastic deep dive. Thanks for bringing this complex topic.
Glad we can unpack it.
And thank you for joining us. We'll Catch you next time on the Deep Dive
