Nvidia Bought Slurm’s Creator: It Makes Sense To One Partner, Another Is Concerned

The SchedMD acquisition makes sense to one top Nvidia channel partner because of the increasingly complex work required to run AI data centers, but it’s sparking concern for another because of the AI infrastructure giant’s history with a previous software acquisition.

Nvidia said on Monday that it has acquired the maker of Slurm, an open-source workload management system that is rooted in traditional high-performance computing but has been increasingly used for large-scale AI clusters.

The acquisition made sense to one top Nvidia channel partner because of the increasingly complex work required to run AI data centers, but it sparked concern for another because of the AI infrastructure giant’s history with a previous software acquisition.

[Related: 9 AMD Acquisitions Fueling Its AI Rivalry With Nvidia]

In acquiring Slurm creator SchedMD, Nvidia said the move will “help strengthen the open-source software ecosystem” and vowed to “continue to develop and distribute Slurm as an open-source, vendor-neutral software, making it widely available to and supported by the broader HPC and AI community across diverse hardware and software environments.”

Financial terms of the deal were not disclosed.

The Santa Clara, Calif.-based company said it will also increase SchedMD’s access to new systems, “allowing users of Nvidia’s accelerated computing platform to optimize workloads across their entire infrastructure.” This, combined with continued support for a “diverse hardware and software ecosystem,” will let customers “run heterogeneous clusters with the latest Slurm innovations,” according to Nvidia.

Nvidia, which has made several software acquisitions, said Slurm plays an important role in helping operators of HPC and AI clusters optimize the utilization of compute resources for complex workloads, noting its deployment among more than half of the top 10 and top 100 systems in the Top500 list of the world’s fastest supercomputers.

The AI infrastructure giant also said that Slurm is considered “critical infrastructure” for generative AI development, “used by foundation model developers and AI builders to manage model training and inference needs.”

Why One Nvidia Partner Thinks The Acquisition Makes Sense

Andy Lin, CTO at Houston-based Nvidia systems integration partner Mark III Systems, called the SchedMD acquisition a “great move” that is “directly in line” with Nvidia’s “open-source-centric” software strategy for things like libraries, frameworks and tools.

“Slurm is really the default, go-to open-source workload manager [and] scheduler for the industry, especially for folks that came from high-performance computing and that are really focused on not only HPC-style, large-scale jobs but also the training of large foundation models. It does incredibly well on that,” he told CRN in an interview.

With Slurm serving as an alternative to the Kubernetes-based Run:ai AI infrastructure management platform that Nvidia acquired last year, Lin said the AI infrastructure giant now owns two “dominant” workload management solutions for customers building AI data centers meant to serve “tens, hundreds or even thousands of users.”

But the executive said he isn’t concerned about Nvidia, the dominant player in the AI infrastructure market, consolidating ownership of two such platforms.

“It still will have an open approach, right? You’ll still be able to train and leverage the same open-source models. Nvidia is probably one of the top, if not the top contributor of open-source in the space, so you’ll still have the advantages from the user community perspective,” said Lin, whose company has won multiple Nvidia Partner Network awards.

Viewed another way, Lin said, the acquisition could be seen as “an acknowledgement of how challenging it is to operate a consolidated AI factory”—the term Nvidia uses to describe a centralized AI data center that serves a broad constituency of users.

“Although it seems straightforward from a marketing perspective, it’s actually very difficult to deploy and run at scale for a long period of time,” he said. “And I think this is probably an acknowledgement of a way to bring more [people with that] skill set into the fold to enable more of these organizations to be successful.”

As a result, the executive expects Nvidia to take advantage of SchedMD’s enterprise support capabilities to provide customers with a more holistic offering for setting up AI data centers.

“Nvidia will be able to leverage their enterprise support capability to be more in line with how [the company] is building out AI factories, specifically for those that want to use Slurm versus something like Run:ai,” Lin said.

Previous Nvidia Software Acquisition Has Another Partner Concerned

This line of thinking about Nvidia using Slurm and SchedMD’s enterprise support to boost AI factories makes sense to Dominic Daninger, vice president of engineering at HPC-focused Nvidia systems integration partner Nor-Tech in Burnsville, Minn.

But unlike Lin, Daninger said he is concerned about how Nvidia’s acquisition could impact Slurm usage among his HPC customers.

The executive based this concern on his experience with Nvidia’s 2022 acquisition of cluster management software vendor Bright Computing.

After Bright Computing was acquired, Daninger said, the vendor’s Bright Cluster Manager software got “very expensive” because of rising licensing and support costs.

“And we just didn’t see the same level of support either, so it caused us to, for the most part, discontinue the use of Bright,” Daninger said.

Daninger said the costs increased as Nvidia merged the product into Base Command Manager software in 2023, which resulted in the company changing the way it charges for licenses to a per-GPU basis and away from Bright’s traditional per-node pricing.

Then late last year, Nvidia made Base Command Manager only available through the Nvidia AI Enterprise software suite, which costs $4,500 per GPU for a year-long subscription and includes enterprise support, Nvidia partner Boston Limited said in a notice at the time.

By contrast, Bright Cluster Manager’s per-node licenses typically cost hundreds of dollars per year before the acquisition, according to Daninger.

“The orientation changed to what Nvidia needed out of it. They can do that when they own it. And I would expect to see some of the same things here with Slurm,” Daninger said.

The executive said he was not aware that Nvidia made Base Command Manager available for free in May through a license that supports up to eight GPUs per system and doesn’t include support, but he added that it was too late since Nor-Tech has largely moved on to other cluster management solutions, including ClusterVision, for its customers.

Nvidia declined to comment on Daninger’s concerns, but the company said in its Monday announcement that it will “continue to offer open-source software support, training and development for Slurm to SchedMD’s hundreds of customers.”