Introducing: vGPU on vSphere, installation and troubleshooting tips

Kees Baggerman | March 13, 2015

At Nutanix the last couple of days I’ve been in vGPU volved in the testing of the new vGPU features from vSphere 6 in combination with the new NVIDIA grid drivers so that vGPU would also be available for desktops delivered via Horizon 6 on vSphere 6.

During this initial phase I worked together with Martijn Bosschaart to get this installation covered and after an evening of configuring and troubleshooting I thought it would be a good idea to write a blogpost on this new feature that is coming.

What is vGPU and why do I need it?

vGPU profiles deliver dedicated graphics memory by leveraging the vGPU Manager to assign the configured memory for each desktop. Each VDI instance will have pre-determined amount of resources based on their needs or better yet based on the needs of their applications.

By using the vGPU profiles from the vGPU Manager you can share each physical GPU, for example a NVIDIA GRID K1 card has up to 4 physical GPU’s which can host up to 8 users per physical GPU resulting in 32 users with a vGPU enabled desktop per K1 card.

Next to the NVIDIA GRID K1 card there’s the K2 card which has 2 x high-end Kepler GPUs instead of 4 x entry Kepler GPUs but can deliver up to 3072 CUDA cores compared to the 768 CUDA cores of the K1.

vGPU can also deliver adjusted performance based on profiles, a vGPU profile can be configured on VM based in such a way that usability can be balanced between performance and scalability. When we look at the available profiles we can see that less powerful profiles can be delivered on more desktops compared to high powered VMs:

NVIDIA GRID Graphics Board	Virtual GPU Profile	Max Displays Per User	Max Resolution Per Display	Max Users Per Graphics Board	Use Case
GRID K2	K260Q	4	2560×1600	4	Designer/ Power User
	K240Q	2	2560×1600	8	Designer/ Power User
	K220Q	2	2560×1600	16	Designer/ Power User
	K200	2	1900×1200	16	Knowledge Worker
GRID K1	K140Q	2	2560×1600	16	Power User
	K120Q	2	2560×1600	32	Power User
	K100	2	1900×1200	32	Knowledge Worker

Source:<a href="http://www.nvidia.co.uk/object/grid-virtual-gpus-uk.html" target="_blank">VIRTUAL GPU TECHNOLOGY</a>

1	Source:<a href="http://www.nvidia.co.uk/object/grid-virtual-gpus-uk.html" target="_blank">VIRTUAL GPU TECHNOLOGY</a>

All the GPU profiles with a Q are being submitted to the same certification process as the workstation processors meaning that these profiles should (at least) perform the same as the current NVIDIA workstation processors.

Both of the K100 and K200 profiles are designed for knowledge workers and will deliver less graphical performance but will enhance scalability, typical use cases for these profiles are much more commodity than you would expect and with the growing graphical richness of applications the usage of vGPU will become more of a commodity as well. Just take a look at Office 2013, Flash/HTML or Windows 7/8.1 or even 10 with Aero and all other eye candy that can be enabled, these are all good use cases for the K100/K200 vGPU profiles.

The installation

Our systems rely on the CVM, the Nutanix CVM is what runs the Nutanix software and serves all of the I/O operations for the hypervisor and all VMs running on that host. For the Nutanix units running VMware vSphere, the SCSI controller, which manages the SSD and HDD devices, is directly passed to the CVM leveraging VM-Direct Path (Intel VT-d). In the case of Hyper-V the storage devices are passed through to the CVM. Below is an example of what a typical node logically looks like:

Source: <a href="http://stevenpoitras.com/the-nutanix-bible/" target="_blank">The Nutanix Bible</a>

1	Source: <a href="http://stevenpoitras.com/the-nutanix-bible/" target="_blank">The Nutanix Bible</a>

It turned out that our CVM played very nicely with the upgrade of vSphere 5.5 to vSphere 6 as it worked exactly as planned (don’t you just love a Software Defined Datacenter?) and I saw the following configuration in our test cluster:

Screen Shot 2014-11-27 at 11.32.40

The installation went without any problems so we could follow (very detailed) guide to setup the rest of the environment. Setting up vCenter 6 and Horizon 6.0.1 was fairly easy and well described but when we got down to assigning the vGPU profiles to the VM I was able to see the vGPU profiles but when starting the VM an error message would occur.

Useful commands for troubleshooting

Confirming GPU installation; lspci | grep -i displayConfirming GPU configuration: esxcli hardware pci list -c 0x0300 -m 0xffVerify the VIB installation: esxcli software vib list | grep NVIDIAConfirming VIB Loading: esxcfg-module -l | grep nvidia

Manually load the VIB: esxcli system module load -m nvidia

Verify if the module is loading: cat /var/log/vmkernel.log | grep NVRM

Check if Xorg is running: /etc/init.d/xorg statusManually start Xorg: /etc/init.d/xorg start

Check Xorg logging: cat /var/log/Xorg.log | grep -E “GPU|nv”

Check the vGPU management: nvidia-smi

In my case the block I was testing on was used for other testing purposes so I when I tried running Xorg it would not start, so I checked the vGPU configuration and noticed that the cards where configured in pciPassthru. That was why Xorg wasn’t running, to disable pciPassthru:

In the vSphere client, select the host and navigate to Configuration tab > Advanced Settings. Click Configure Passthrough in the top left.Deselect at least one core to remove it from being passed through

After that I had a different error in the logs of my VM:

vmiop_log: error: Initialization: VGX not supported with ECC Enabled.

With some help from Google I found the following explanation: Virtual GPU is not currently supported with ECC active. GRID K2 cards ship with ECC disabled by default, but ECC may subsequently be enabled using nvidia-smi.

Use nvidia-smi to list status on all GPUs, and check for ECC noted as enabled on GRID K2 GPUs. Change the ECC status to off on a specific GPU by executing nvidia-smi -i <id> -e 0, where <id> is the index of the GPU as reported by nvidia-smi.

After this change I was able to boot my VM, create a Master Image and deploy the Horizon desktops with a vGPU Profile via Horizon 6:

note 1: I was performing remote testing with limited bandwidth, as you can see the desktop did up to 66FPS.

note 2: Please be aware that although this testing was done on a Nutanix powered platform vSphere 6 is not supported by Nutanix at this moment, support will follow asap but be aware of this.

Bio
Latest Posts

Kees Baggerman

Kees Baggerman is a Staff Solutions Architect for End User Computing at Nutanix. Kees has driven numerous Microsoft and Citrix, and RES infrastructures functional/technical designs, migrations, implementations engagements over the years.

Latest posts by Kees Baggerman (see all)

Nutanix AHV and Citrix MCS: Adding a persistent disk via Powershell – v2 - November 19, 2019
Recovering a Protection Domain snapshot to a VM - September 13, 2019
Checking power settings on VMs using powershell - September 11, 2019
Updated: VM Reporting Script for Nutanix with Powershell - July 3, 2019
Updated (again!): VM Reporting Script for Nutanix AHV/vSphere with Powershell - June 17, 2019

Category: Citrix, EUC, Nutanix, VMware | Tags: Nutanix, VDI, vGPU, VMware

4 comments

diomac says:

March 27, 2015 at 22:47

When you tested this did you already have a the nvidia vib loaded on the previous 5.5 installation? Or did you only install it after you did the upgrade from 5.5 to 6?

Reply
- k.baggerman says:
  
  March 30, 2015 at 09:37
  
  The old NVIDIA vib was already installed based on the vSphere 5.5 installation on our NX7000, I had to remove it before I could install the newer VIB.
  
  Reply
The anatomy of a virtual desktop (Back to Basics) • My Virtual Vision says:

January 13, 2016 at 14:43

[…] vGPU […]

Reply
- Ray says:
  
  February 16, 2016 at 11:28
  
  Hi Kees,
  
  i realize it has been some time since you did your Grid test, we have come across an issue where after installing the VIB file and checking that it was OK with the nvidia-smi command, i don’t get the option to add a Shared PCI Device in the VDI machines settings.
  
  ESX servers are running 5.5 and VSphere is showing as 6.0 is this the issue? are we running an older version of OS and software and as such not getting to see what i need to add the correct device, or have i missed something fundamental on the install process, like actually stating that these devices need to be setup as pass-through devices in the ESX servers settings.
  
  I have placed a call with our support team who installed the servers but its not looking like they are getting back any time soon. A little help would be great as i’m stuck as to what to do other than upgrade the servers.
  
  Reply

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

My Virtual Vision