Welcome back to another deep dive. Today. We're opening up a topic that I think a lot of us in the crunches have a bit of a love hate relationship with.
OHI JEFFA.
We are talking about vSphere seven point X, and I know the immediate reaction for you listening is probably great. Another update, another version number to track. But we've been pouring over the official cert guide for EXAM two V zero DASH twenty one point two. Yeah, that's by Davis, Baka and Thomas And honestly, this feels different. It really doesn't feel like just a service pack.
It really isn't. I mean, when you actually dig into the architecture changes let out in this guide, VRE seven represents a massive pivot.
Ye.
It's the exact moment where VMware stopped just virtualizing servers and really started enforcing the software to find data center the SDDC. We're moving from a world where we manage individual boxes to a world where we manage policies in desired states.
Desired state that is the buzzword, right, Yeah, But looking at the source material here, there is some serious engineering behind that marketing term. Absolutely, we've got the death of the external platform services controller, finally long.
Overdue, the complete overhaul of how storage is handled with VSN and vvols, and some pretty scary warnings about where you can and cannot install ESXi anymore.
Yeah, it's a lot to unpack. The guide is dense because the changes are so fundamental. They're basically ripping out legacy code that has been there for a decade and replacing it with this modern container aware architecture.
So let's start with the brain of the beast vsenter server. Now, if you're listening and you were running visphere six point zero or six point five, you probably have battle scars from dealing with the Platform Services Controller, the ps.
OH, the topologies.
I remember having to design these incredibly complex topologies with external ps sitting behind third party load balancers just to get single sign on to work across sites. It was frankly a nightmare.
It was incredibly complex. You had to manage replication agreements between those PSCs, You had to manage the certificates for each individual one.
The certificates were the worst, right and if the load balance er misbehaved, your admins literally couldn't log in. But the good news from the cert guide is that VS seven effectively kills the external psc.
So it's dead, Like I don't have to build them anymore.
It is dead. The entire architectures that collapsed back into a simplified single appliance model. Okay, all those services, single sign on, the license service, the VMware Certificate Authority, They're all now running natively inside the v Center Server appliance the VCSA.
Okay, that sounds great for a greenfield deployment, but what about the poor admin out there who has that complex six point seven topology with external PSCs. Is the upgrade path going to be a complete rip and replace scenario?
Surprisingly, no, The guide highlights that the upgrade tool is actually a converge tool. Oh interesting, When you run the VS verse seven installer against your existing environment, it actually detects those external ps migrates their data and identity into the new vCenter appliance, and then essentially decommissions the old nodes. Ah, it converges the topology automatically for you.
That is a huge relief to hear. But there's another obituary in here. The guide is pretty explicit that vCenter Server for Windows is.
Gone correct, done, no more installing v Center on top of Windows Server. We are strictly in the world of the Photonos appliance now, right, which is great for security and patching, but it does mean if you are relying on Windows specific scripts or agents running locally on your v center box.
Which a lot of people did exactly.
That workflow is completely broken. Now you have to adapt.
Okay, So we have this single monolithic appliance now handling everything. It's the brain, the heart, and the nervous system. But if that appliance dies, we are flying blind. I know v Center High Availability existed before, but the guide makes it sound like the architecture under the hood has really changed. How are we keeping this thing alive?
So? vCenter ha and version seven is very slick, but you have to understand it requires specific networking. It uses a three node cluster. Okay, you've got an active node, a passive node and a witness note walk.
Us through the replication. There is this just doing a standard storage mirror.
No, No, it's much more intelligent than the active node. The one you are actually logged into and using is replicating data to the passive node through two distinct channels. Two channels, right, it uses native postgracle replication for the database, ensuring that all your inventory and events are SYNCD transactionally. And then it uses a separate file level replication basically zer sync to keep the configuration files in check.
And the WITNESS is that just a third copy of the database.
Not at all. The WITNESS is just a tiebreaker. It's a very lightweight clone. It doesn't hold the database or the files.
Noo, what does it do?
Its only job is to provide quorum. If your network kickups and the active and passive nodes lose sight of each other, you run the massive risk of a split brain scenario where both think they are the mas.
Right, which corrupts everything exactly.
The WITNESS basically casts the deciding vote on who actually.
Owns the cluster, and the guide mentions some strict requirements for this right. You can't just throw these nodes anywhere on your network.
No, you really can't. You need a dedicated vCenter HA network interface on each node, and the latency required is strict, how strange, less than ten milliseconds between the active and passive nodes. If you try to stretch this across a laggy wham link, the Postgres school replication will time out and the cluster will just fail.
Get to know. So we've bulletproofed the brain, but the brain is useless if the body, meaning the esx offs themselves, are crumbling.
That's true, and.
That brings us to a somewhat controversial change in the guide regarding how we actually boot these servers.
Oh yeah, the boot media.
For years in my home lab, and honestly even in some production environments, I just slap ESXi on a generic eight gig USB stick or an SD card. It was cheap, it worked. But reading this guide, it sounds like VMware is declaring war on USB boot drives.
War might be a strong word, but they are definitely waving a massive red flag for you. The issue isn't just the capacity, it's the partition structure.
Okay, break that down.
In previous versions, that boot drive really just held the hypervisor image, which is tiny. It loads into memory and it's done. But VS seven introduces a completely new partition layout. The big one you need to know about is ESXOS data.
ESXOS data. That sounds ominous. What goes in there?
It's consolidation. It takes the old scratch partition, the locker for VMware tools and the core dump location and puts them in one place. But here is the kicker. It is formatted with VMFSL.
VMFSL like the filesystem used in VSL.
Exactly like that. It's a high performance filesystem designed specifically for frequent reads and writes. This partition stores system logs, traces, and live database entries for the host itself.
Okay, I see where this is going.
Right. If you put that kind of heavy IO load on a cheap consumer grade USB stick or an SD card, you're going to burn out those nan flash cells in a matter of months, maybe even weeks.
So the hypervisor is literally.
Right rights to drive to death. Yes, And because of this, if the installer detects you or booting from a low quality USB device, it creates the esx O STATA partition, but it runs it into.
Graded mode degraded mode.
Yeah, it tries to limit the rights to save the drive, but you lose functionality and your logs are seriously at risk.
So what is the actual recommendation from the cert guide? Are we going back to spinning rest for boot drives?
The guide firmly recommends a local persistent disc that means an HDD or an SSD of at least thirty two gigabytes. That gives you enough room for the boot banks and a fully functional esx ostate a partition.
And if you absolutely have to use USB.
If you must use USB, you have to pair it with a local disc to offload that scratch partition, or you are living on borrowed time.
That is going to catch a lot of people off guard during a hardware refresh. Speaking of things that catch people off guard are DNS and NTP classics. I feel like we talk about this every year, but the guide is just relentless about it.
This time. It has to be in vSphere seven. The dependencies are hard coded. Take DNS for example, when you are deploying that new VCSA, the installer pauses and actually performs a reverse look up on the IP address you provided. If it cannot resolve that IP back to the fully qualified domain name the FQDN, the installation literally fails. It doesn't warn you, it just stops.
Zero tolerance.
Zero tolerance and NTP is even more critical because of sso how so the authentication tokens, the insale tokens used between v center and the hosts. They're all time stamped. If your host drifts more than a few minutes away from the v center time those tokens are rejected. Oh man, suddenly your backups fail, v motion fails and you literally can't log into the host.
So for everyone listening, check your PTR records and your time servers before you even download the IO.
The un sexy work that saves the deployment, let's.
Shift gears to something a little sexier, store orridge. This seems to be where the software defined part of SDBC really kicks into overdrive. Absolutely, we still have the classics VMFS and NFS. Are they just legacy support now or have they actually improved in this version?
Oh, they've definitely evolved. VMFS six is the absolute standard now and the big thing it handles is automatic unmap.
You m in how that works.
In the old days, if you deleted one hundred gigs of data inside a Windows VM, the underlying storage ray had no idea that space was free. It stayed marked as used. VMFS six automatically sends commands down to the array to reclaim that space.
That's a huge space saver. And what about NFS. I usually associate NFS with holding isophiles, not running high performance workloads.
Right, But VS seven pushes NFS four point one, which is a massive leap over NFS three. The two big features you get are multipathing and cerberos.
Okay, multipathing makes sense.
Yeah, NFS three relied on a single TCP session. If that link got saturated, you were bottlenecked. NFS four point one supports true multipathing across multiple links, and Cabero's means we can finally encrypt that storage traffic on the.
Wire, which is a huge compliance requirement for a lot of organizations. Now exactly, now, I saw an acronym in the guy that was new to me, HPP, the high performance plug in. For the last decade, we've relied on NMP, the native multipathing plug in. Why do we need a new one? Now?
This is entirely driven by the rise of NVMe non volatile memory express. Think about NMP as a traffic cup that was designed for traffic in the nineteen nineties, spinning discs.
Right.
It has these complex locks and queues that made total sense when drives were slow, But modern in VM flash is so incredibly fast that the software stack NMP itself actually became the bottleneck.
The software couldn't click the send button fast enough for the hardware.
Precisely, so VMware wrote the HPP specifically for ENVME and nvmey over fabrics. It removes those legacy locks and optimizes the whole iopath to handle millions of IOPs without killing your CPU over.
So, if you're buying an all flash NVM array today, you need to be using HPP.
If you aren't, you're just wasting the money you spend on that array.
This moves us nicely into the no More Lams conversation vivols or virtual volumes. The concept has been around for a while, but the guide really treats it as a primary citizen in version seven for those who haven't deployed it. What is the actual mechanism here?
So vviles changes the entire relationship between vCenter and the storage array. It introduces a component called the VSA.
Provider vSphere APIs for Storage Awareness right.
This acts as a translator. Instead of the array presenting a dumb ten terabyte block of space a LN, the VISA provider tells vsenter, hey, I can do replication, I can do dduplication, and I can do encryption, And then v center pushes that policy down to the individual VM level. Exactly. When you create a VM, the array creates a specific virtual volume just for that VM's disc. If you need to snapshot that VM, the array snapshots only that volume.
You aren't snapshotting a whole li in with twenty other vms on it.
It gives you granular control that matches the application, not the hardware limitations exactly. But if we really want to talk about true software defined storage, we have to talk about VSAM. This really feels like the heart of the modern VMware stack. Conceptually, we're taking local discs in the servers and pulling them across the network. But the devil is always in the details, specifically regarding disc groups.
The disc group is the fundamental building block of VSAN. Each host participating needs at least one and the strict rule laid out in the guide is one cash device and one or more capacity devices.
Per group, and that cash device is non.
Negotiable, completely non negotiable. It must be flash. Even in a hybrid cluster where your capacity tier is made of cheap spinning discs, that cash tier has to be high performance SSD. Why is that because it absorbs one hundred percent of the right operations. It acts as a buffer. And here's the critical part. If that cash drive fails, the entire disc group go offline.
Wow. Now the guide gets into some math that I think is really important for anyone doing capacity planning. It talks about RAD five and RADE six erasure coding. Traditionally, if I wanted redundancy, I use RAD one mirroring. I have one hundred gigs of data. I need two hundred gigs of disk space.
Right, a two hundred percent overhead. That gets incredibly expensive when you're buying enterprise flash drives. Erasure coding changes the algorithm. It stripes the data across the host with parity.
Bits like old school hardware RAID five.
Very similar logic, but distributed across the network instead of a bad plane. With RAD five eraser coding, you need a minimum of four hosts. It uses a three plus one calculation, so your overhead drops from two x down to about one point three to three acts. That's signific You get the exact same level of protection, but you save a massive amount of raw storage capacity.
That is a huge cost difference at scale. But it's only available on all flash configurations right.
Correct, The parody calculation requires significant CBU and random io performance. Spitting discs simply cannot keep up with the read modify right penalty of erasure coding without totally tanking your VM performance.
Got it now. One of the most fascinating topologies in the guide is the stretched cluster. This is the scenario where you have two data centers, Site A and Site B, and you want them to essentially act as one giant cluster. But I've always wondered about the split brain problem here. If the fiber line between the buildings gets cut, how do you stop them from fighting over who is the active site.
That's the classic two generals problem, and VSAN solves this with a witness host. You place this witness in a third location, a totally different fault domain from Site A Insight BK, and it holds the metadata components of the VSN.
Objects, so it acts as the referee.
Exactly imagine Site A in Sight B lose connection. Site A looks at the witness and says, hey, can you see me? The witness says yes. Sit A then knows it has a quorum. It has two out of three votes, so it stays online and Site B Site B can't see the witness or site A, so it mathematically knows it has lost the vote. It immediately shuts down its VMS to prevent any data corruption.
And since the witness only holds metadata, it can run on a pretty small connection, right, Yeah.
It's tiny. You can run it inside a small cloud instance or just an old server in a remote office. It's not storing the actual VMDK data, just the state of the data.
This all leads to the overarching philosophy of vSphere seven, which is SPBM storage policy based management. It feels like we are finally moving away from the old gold, silver bronze lun mindset.
That is absolutely the goal. In the past, the infrastructure dictated the policy. You'd say, I have a fast lun, so put the database there. With SPBM, the application dictates the infrastructure.
How does that look in practice?
You create a policy and vCenter say mission critical encryption enabled RAD one protection. Yeah, you just assigned that policy directly to.
The VM, and v center acts as the broker exactly.
V center looks at your VM data store or your vfles and checks can I satisfy this requirement? If yes, it places the VM. If say six months later, a drive fails and the VM is no longer protected. V Center flags it as non compliant.
That's a big shift. It changes the admin's job from provisioning storage to monitoring compliance.
It's all about desired state.
Before we wrap up, we have to touch on the integration of modern apps. The guide mentions first class discs and Kubernetes support. I think a lot of infrastructure admins here Kubernetes and just tune out, thinking it's purely a developer problem. But vsp seven makes it an infrastructure problem.
It really does. The concept of the first class disc or STD is crucial here. In the past, a virtual disc was always a child of a virtual machine. If you deleted the VM, the disc died.
With it, which is fine for a traditional server, but really bad for a container exactly.
Containers are ephemeral. They spin up and die in seconds, but the data they generate, like a database file, needs to persist. A first class disc is a managed storage object that exists completely independently of any.
V so it just floats out there.
Yes, and kuberd eddies can request storage. vSphere creates an FCD and that disc can be attached and detached to different container worker nodes as needed. It allows you to run staple apps on the exact same VSR platform you use for your traditional Windows servers.
So bringing it all together, vSphere seven isn't just a facelift. It is a fundamental re architecture. We've killed the external PSC and simplified the management plane. We've tightened the screws on hardware reliability with VMFSL boot partitions and those strict DNS requirements. And we've moved storage from a static hardware mapping to a dynamic, policy driven engine. With VAN and SPBM, it's really.
About vSphere becoming the universal platform. Whether it's a legacy SQL server or cloud native micro service, the goal is to manage them with the exact same policies, the same availability, and the same security.
I want to leave you with one final thought from the guide something called life Cycle Manager. We didn't have time to deep dive into it today, but it applies that same desig hired state logic to the hardware firmware itself. It can actually push BIOS and HbA driver updates to the physical servers to match a policy you set.
It's an absolute game changer. It means you aren't just patching ESXC anymore. You are patching the actual metal underneath it all directly from vCenter.
Right, and if vSphere is managing the physical firmware, the storage controller, and the application policy, are we looking at the end of the traditional hardware maintenance window as we know it. It feels like we are getting closer to the dream of a truly fluid infrastructure.
We are definitely getting there. The hardware is basically just becoming code at this point.
Plenty to think about before your next upgrade. Check those boot partitions, verify your dns, and maybe start playing with SPBM in your lab. Thanks for joining us on this deep dive into VC or seven.
So glad to be here.
