VMware, Nvidia ‘Reinvent Enterprise Computing’ With vSphere 8

‘We can offload processing of software-defined infrastructure tasks, like network processing, like storage processing, to the DPU. And now you get accelerated I/O. And you can have agility for developers because all of that storage and network processing is now running in the DPU. Now you have the entire cluster of CPUs to run this distributed workload that you talked about, whether it‘s containerized or in a VM. That’s the power of the DPU,’ VMware CEO Raghu Raghuram says.

The goal of Project Monterey – introduced yesterday as vSphere 8 — wasn’t to increase the available compute cycles in data centers, or reduce their power consumption.

But the goal of Project Monterey is to prepare the data center for the huge workloads of the future, according to VMware CEO Raghu Raghuram.

“It’s becoming mission critical to every enterprise. Nobody’s going to talk about five years from now ‘that AI enabled application,’” Raghuram said. “It will just be assumed that machine learning is part of every application.”

“That’s right” said Nvidia CEO Jensen Huang, who was sitting beside him at VMware Explore, where they took the lid off the highly anticipated project in a fireside chat that was two years in the making.

Compute cycles in the modern data center are eaten up by infrastructure tasks that keep the systems running. However, as cloud adoption rises and with it cloud services, the infrastructure tasks have become more demanding, which leaves fewer compute cycles for the workloads that generate revenue.

To overcome this, Raghuram and Huang said their companies worked together to “rearchitect” the data center using a data processing unit, or DPU. The DPU is not a new idea. It’s a board that runs hardware accelerators that perform some functions very efficiently. In the case of Project Monterey, the DPU is taking on all of those infrastructure tasks so the processors in the data center can be turned over to the work it was designed to do.

Due to the compute power needed to run machine-learning programs, AI applications reside in dedicated locations, so data must travel to them to be processed. Raghuram said, no longer.

“Better still, you can run AI along your mainstream workloads, so you get better efficiency, better efficacy, cost management, overall management,” Raghuram said. “It gives customers exactly what they want: the best in AI, managed on the platforms they trust.”

Read the rest of their conversation below:

AI in the Enterprise

Raghuram: Just as an example, Carilion Clinic, one of the largest health care organizations in Virginia they’re using our software, on your platform to future proof hospitals. They care for over a million patients, and they’re using AI to accelerate their image processing, using your software.

Huang: AI is surely transformative. AI will revolutionize every single business. There are no businesses in the future that will not be using machine learning, that will not be transforming the data of their company and industry and harvesting intelligence from it and deploying it at scale. Those companies simply won’t exist. So every single company in the future will be an AI business.

Our work together is really about transforming how every company could develop and apply AI. We’re working all sorts of AI, from computer vision, to speech AI, conversational AI, natural language processing, recommender systems. Companies are harvesting tons and tons of data, refining it, and discovering predictive models from it. Essentially becoming AI factories.

So on the one hand, a modern enterprise is going to be a distributed AI supercomputer, on the other hand it is going to be a disaggregated data center running applications that are essentially hyper scale and these this behavior the ability for a data center to adapt in this way, on the one hand, do AI processing to learn the models, on the other hand, deploy and scale these AI applications across the entire data center. This particular characteristic is very unique to AI and this was the grand challenge for our computer scientists. And this is what Monterey has really enabled, the ability to simultaneously on the one hand, be a high performance computer. On the other hand, be a hyperscale cloud computer, this capability is really quite unique and it‘s really groundbreaking. So I’m super excited about that.

The DPU

Raghuram: So if you’re thinking about the DPUs, it’s not just in the data center, right. In fact, a lot of machine learning is going to happen at the edge with autonomous driving, cashless stores and modern manufacturing, and so on and so forth. All of this is going to be the edge and like you pointed out in the data center as well. So DPUs play a pretty significant role in this new infrastructure architecture, to accelerate performance, to free up the CPU cycles, and, certainly, most importantly, to provide better security, which we should talk about some more.

We have re-architected vSphere to fit on the NVIDIA DPU. So we can offload processing of software-defined infrastructure tasks, like network processing, like storage processing, to the DPU, right. And now you get accelerated I/O. And you can have agility for developers because all of that storage and network processing is now running in the DPU. Now you have the entire cluster of CPUs to run this distributed workload that you talked about, whether it’s containerized or in a VM. That’s the power of the DPU.

Huang: If you think about the work that we’re doing, this is high-performance computing, in a virtualized environment. High performance computing and virtualized environments are like oil and water. This is the big break through of vSphere 8. You’ve given us the ability to do virtualization at the datacenter scale, for distributed, high-performance computing applications like AI model development. As well as containerized deployment and disaggregated computing. What made it possible was the new type of processor that we’re going to introduce.

NVIDIA BlueField (the internal name NVIDIA used for Project Monterey) is the DPU that Raghu you were mentioning just a second ago. It is essentially an accelerated computing platform that is designed for the infrastructure software of data centers. Today’s data centers or software-defined networking, virtualization storage and cybersecurity. All of that is sitting in software in dataccenter. For all wonderful reasons.

However, that software stack, that software layer takes away from the computing resources that we need for high performance computing applications like AI. So with BlueField, we‘ve been able to offload, accelerate and isolate, essentially, the operating system of the data center, on a new type of processor called a DPU, and as a result, freed up all of the CPU resources and all the GPU resources available for computation.

The benefits, and I think that this is this is when all the enterprise all the companies discover the return on investment. The benefits that DPU vSphere. Eight with Nvidia BlueField will be so fast, because it frees up so many resources for computing that the payback is going to be instantaneous. It‘s going to be really a fantastic return.

Raghuram: The other beauty of this architecture is you can reconfigure your data center at will. You can take a regular server, all of a sudden through the magic of connecting it to the DPU, now you get all these advanced storage and networking capabilities on demand as you need it. This will hopefully make the data center very fungible, but it is also uniquely enabled by the DPU.

Security

Raghuram: One of the critical ideas that VMware puts forward to our customers is the idea of intrinsic security, having security baked into the platform and with the GPU. Now, we‘ve not only got the vSphere core base running there, but also the NSX code base running there. So now for the first time, you can handle east west traffic, the traffic going between applications on the CPU cores at line speed and impulse security without breaking up anything. So that again, is something that’s possible because of the GPU architecture.

Huang: The security aspect of it is so important. The reason for that is as we modernize the applications in our data centers, and it runs in containers across an entire data center so that we could scale it out. It’s now disaggregated. It is running distributed in a distributed way. And it’s all being orchestrated by Tanzu and vSphere 8, the amount of east west traffic as you were mentioning earlier, is going to grow exponentially. East west traffic, from microservice to micro service, container to container is going to open up the attack surface of companies fairly significantly as a result. This is where NSX running on BlueField is so vital.

The reason for that is because of course, perimeter firewalls were a great invention. But in today‘s multi-tenant, cloud-native, disaggregated-application, scaling out the amount of east west traffic is so significant. The attack surface is so big, that we essentially need to put NSX, no longer at a data center scale or cluster scale, but we need to put NSX literally within every single compute. And this is where BlueField and vSphere 8, with NSX running with BlueField is such an incredible revolution. We’re essentially going to have a firewall in every single computer.

vSphere Architecture

Raghuram: The beauty of what the vSphere engineers have done is they have not changed the management model. You can still manage the DPU just like you manage the server, with the same vCenter, the same DRS and a lot of those constructs. It can fit seamlessly into the datacenter today, while enabling the future.

Huang: We now have a stack for AI computing. We have a brand new way of deploying applications, whether it’s the classic enterprise monolithic, virtualized, to distributed computing, disaggregated AI deployment. With security at every single node and every transaction. To scale this out to a data center? You know, Raghu, we have reinvented enterprise computing in a big, big way.