Xilinx’s New Alveo HPC Accelerator Is Its ‘Most Powerful’ Yet

The FPGA maker is positioning its new Alveo U55C accelerator card as a more efficient and flexible alternative to CPUs and GPUs for a ‘broad spectrum of customers’ running high-performance computing and big data workloads, packing 16GB of HBM2 memory in a single-slot form factor.


Xilinx said its new Alveo U55C is its “most powerful” accelerator card yet, bringing more compute density and high-bandwidth memory to high-performance computing and database workloads in a “slimmer, more efficient” design that enables “superior performance-per-watt.”

The San Jose, Calif.-based FPGA maker — which AMD is in the process of acquiring for $35 billion — launched the new accelerator card during the SC21 supercomputing conference on Monday, positioning the Alveo U55C as a more efficient and flexible alternative to CPUs and GPUs for a “broad spectrum of customers” running HPC and big data workloads.

[Related: AMD Blitzes Intel, Nvidia With New Faster EPYC, Instinct Chips]

Sponsored post

Nathan Chang, HPC product manager at Xilinx, told CRN that the Alveo U55C can tackle such workloads better in part because of the “hyper-parallelism capabilities” of the field-programmable gate arrays that power the company’s Alveo accelerator cards. The card also comes with customized memory hierarchies, which he said better serves HPC and big data workloads than the structured memory hierarchies present in general-purpose GPUs.

The Alveo U55C is more efficient when it comes to data movement too, according to Chang, which cuts down on the “wasted clock cycles” and “wasted power” that CPUs and GPUs experience.

“We can optimize the data movement with our ability to create micro circuitry within our pipelines, within our circuits that can optimize how not only data is moved but how it’s served up from function to function,” he said. “And all of this means less movement between data and buffers and memory and caches and, ultimately, better performance-per-watt.”

While Xilinx did not share comparisons with GPUs, Chang said a node with eight Alveo U55C cards can perform work in Ansys’ LS-DYNA crash simulation software five times faster than a CPU-based node with Intel’s Xeon Platinum 8260L CPUs. The company did not provide configuration details for the comparison, but Chang said there are significant financial implications with such a speed boost.

“There’s a lot of money to be saved in the running of this at five times the normal speed,” he said.

In another example, Xilinx said TigerGraph, a provider of graph analytics software, used Alveo U55C-based clusters to achieve 45 times faster performance for fraud detection versus CPU-based clusters. The use of Alveo-based clusters also allowed TigerGraph to reduce DDR4 utilization by 66 percent and improve the quality of transaction scores by 35 percent.

Compared to Xilinx’s existing Alveo U280 accelerator card, the Alveo U55C doubles the capacity of the high-bandwidth memory, known as HBM2, to 16GB while reducing the form factor’s size from a dual-slot to a single-slot PCIe card. While the typical power draw is slightly higher at 125 watts, the U55C reduces maximum power to 150 watts from the U280’s 225-watt power envelope.

While the Alveo U55C’s other aspects, like the logic and internal SRAM, remain the same as the Alveo U280, Chang said that the smaller form factor means “you have twice the density of compute in a single installation.” The higher HBM2 capacity also means “you quadruple the amount when you cluster,” going from 8-32GB in only two PCIe slots, Chang added.

Chang said a network of Alveo cards in a cluster can compete with InfiniBand-based clusters on performance and latency when used with the second version of RDMA over Converged Ethernet, also known as RoCE v2, and data center bridging.

“You could do this on a lossy network that’s already there. But if you’re able to build a separate network, Alveo-based that uses RoCe v2 and Data Center Bridging on converged Ethernet, you could build a lossless network is very competitive,” he said.

Developers can take advantage of the Alveo U55C’s acceleration using Xilinx’s Vitis unified software platform, which includes domain-specific development environments, accelerated libraries and APIs as well as core development kits.

The Alveo U55C is available now through Xilinx.com and authorized distributors. It can also be evaluated through FPGA-as-a-Service providers in public clouds. Clustering solutions are available for private previews while general availability won’t happen until the second quarter of next year.

Alexey Stolyar, CTO of International Computer Concepts, a Northbrook, Ill.-based server integrator, told CRN that the one of the hurdles companies like Xilinx face in introducing new kinds of accelerators is the fact that customers need to switch to new toolsets to use them.

“There have to be very large reasons to do that,” he said.

That may not be a big deal for larger companies, but it could impede smaller firms from adopting new accelerator types faster, according to Stolyar.

“If the reason is you’re a large bank and fraud detection is going to be much quicker and all these benefits [that come with Xilinx’s Alveo cards], yeah, there’s a reason for them to do it, and they’re going to find the resources to do it,” he said. “But smaller firms might not have those resources. They might not want to do it just because it’s an effort.”