Fixing vSAN driver compatibility on Dell R7515

A while back, we purchased some vSAN Ready nodes for a new cluster. The machines came with ESXi installed in an all-NVMe configuration, but when setting up vSAN, Skyline Health kept complaining that the driver used for the write-intensive cache drives wasn’t certified for this purpose.

I opened support cases with both VMware and Dell as I was in a hurry to get the machines running but didn’t know where the problem lay – we had an identically specced cluster that had been manually installed with vSphere 7 earlier where this issue did not occur. Unfortunately none of the support cases ended with a viable resolution: I seem to have gotten stuck with first-line support in both cases and didn’t have time to nag my way to higher levels of support – the shibboleet code word never seems to work in real life.

I finally compared what drivers actually were in use on the new servers versus the old ones and realized the cache disks on the new servers erroneously used the intel-nvme-vmd driver, while on the older hosts all disks used VMware’s own nvme-pcie driver. The solution, then was very simple:

For each host, I first set the machine in Maintenance Mode, enabled the ssh service, and logged in.

I then verified my suspicion:

esxcli software vib list | grep nvme
(...)
intel-nvme-vmd                 2.5.0.1066-1OEM.700.1.0.15843807     INT      VMwareCertified   2021-04-19
nvme-pcie                      1.2.3.11-1vmw.702.0.0.17630552       VMW      VMwareCertified   2021-05-29
(...)

I removed the erroneously used driver:

esxcli software vib remove -n intel-nvme-vmd

And finally I rebooted the server. Rinse and repeat for each machine in the cluster.

After I was done, I re-checked Skyline Health for the cluster, and was greeted with the expected green tickmarks:

Image showing green tickmarks for all tested items.