paul_venezia
Senior Contributing Editor

Do the virtualization math: When four CPUs aren’t four CPUs

analysis
Oct 8, 20125 mins

Four virtual cores or four virtual sockets, what's the difference? It could be a lot

One of the major advantages of virtualization is the ability to dynamically add CPU and RAM to running virtual machines. Have a box that gets a sudden spike? Add more RAM on the fly and let it go. It’s a fantastic way to deal with certain compute issues, and it can make a tough decision disappear due to the fact that downtime and reboots aren’t required.

However, allocating CPU and RAM with the click of a mouse — dynamically or otherwise — can have deleterious effects on your servers in some circumstances. You really need to understand your workload and your OS.

[ Also on InfoWorld: First look: Driving VMware vSphere 5.1 | What’s key in VMware’s new vSphere, vCenter, and vCloud | The price of success: VMware’s big integration challenge ]

It all comes down to the type of workload you’re running, the OS scheduler, and the virtual CPU layout for the virtual machine. Virtual CPU allocations used to be simple. You specified how many virtual CPUs you wanted to assign and off you went. However, as the number of physical CPU cores increased and NUMA became the norm, that choice became trickier. Now, just about every major hypervisor presents a choice of virtual CPU types.

For instance, if you want to assign four virtual CPUs to your virtual machine, you can choose between four single-core CPUs, two dual-core CPUs, and one quad-core CPU. While all of these selections wind up presenting four virtual CPUs to the virtual machine, they do so in different ways, and the differences can impact the decisions made by the OS scheduler running on that virtual server.

Virtual machine alchemy There’s no hard-and-fast rule about these selections. The right choice is extremely dependent on the workload profile, the scheduler in use, and the OS version or kernel version. Older kernels less adept at dealing with multicore CPUs may have a better time with single-core CPU assignments. Newer kernels and OS versions might prefer multicore CPU presentations.

Beyond that, the nature of the workload itself can have a big impact. Single- and multithreaded workloads will handle each of these instances differently. There may be only slight differences in some workloads, but massive differences in others.

Picture a modern OS that’s well versed in NUMA. Taking advantage of NUMA permits faster memory access and can significantly speed up processor and RAM-intensive processes. If a CPU core interacts only with memory controlled by that CPU, it will perform faster, as it does not need to cross to another CPU to allocate and use memory.

This is fairly basic, sort of like how it’s quicker to go to the store across the street rather than one across town. However, when you insert a hypervisor underneath an OS, the relationships between CPU cores and RAM allocations can get a bit murky.

Depending on how the hypervisor presents CPUs to the virtual server, the OS may think that each CPU has its own memory controller or there’s a shared memory controller across four cores, for example. Underneath that, the hypervisor is constantly polling virtual server memory allocation status and assessing whether to move active memory closer to the CPU currently handling the load for that VM. There can be cases where performance dips due to all of these factors, and the fact that they’re occurring at once.

Straw into gold Fortunately, there’s a way to tell which method suits your workload best — test the hell out of it. Build up several virtual servers, each with a different virtual CPU layout, and run a sample workload. Look into deeper tweaks you can make regarding NUMA allocations at the hypervisor level, and test various scenarios by tweaking those parameters.

For instance, VMware vSphere has deep tuning parameters, such as numa.vcpu.maxPerMachineNode and numa.vcpu.maxPerClient, which allow you to adjust the maximum number of virtual CPUs that can reside on a single NUMA node and the maximum number of virtual CPUs that are rebalanced as a single unit by the hypervisor. There are several other parameters as well that may have a greater impact in your specific case, but the fact is that you may be able to boost your workload performance with a few tiny tweaks and some testing.

This isn’t a new concept. I noted this type of performance tweak 18 months ago in the last InfoWorld virtualization shoot-out, specifically with respect to Red Hat Enterprise Virtualization, but it’s a bit of knowledge that I find goes largely overlooked. So when you’re building and tweaking your virtual machines, remember that four doesn’t always equal four when you’re talking about virtual CPUs. Spending a little time testing can save a lot of time processing.

This story, “Do the virtualization math: When four CPUs aren’t four CPUs,” was originally published at InfoWorld.com. Read more of Paul Venezia’s The Deep End blog at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.