California Digital Pushes

The Fremont, Calif.-based company, which has deployed an Intel-based supercomputer at Lawrence Livermore National Laboratory and a PowerPC-based one at Virginia Tech, will seek to leverage open architecture to grow adoption of its Apple hardware-based design, executives said.

CALIFORNIA DIGITAL'S BUSY SCHEDULE>> Built a 1,000-plus node supercomputer for the Lawrence Livermore National Laboratory using Intel processors
>> Built an 1,100-plus node supercomputer for Virginia Tech using PowerPC processors
>> Developing software that will manage and prevent downtime over node failure for supercomputers

The system builder believes there is a potential growth market for clients desiring highly scalable systems using PowerPC based on open technology.

"We are working on that," said Ravi Chhabria, California Digital's executive vice president of marketing and strategy. "It takes [a customer] with a need and a desire to do that. There are quite a few opportunities in place."

The Virginia Tech supercomputer was built with 1,100 Apple XServe G5s, each at least 2GHz, with between 512 Mbytes and 8 Gbytes of RAM per node. The interconnects were based on open architecture: InfiniBand or Gigabit Ethernet communication fabrics, including network switches, adapters and cables.

id
unit-1659132512259
type
Sponsored post

When the system was originally installed at Virginia Tech last year, it immediately shot to No. 3 on the top 500 supercomputers list.

Chhabria said the open nature of much of the software involved in the implementation, plus the InfiniBand interconnects, can provide it with an edge over proprietary offerings. California Digital, in fact, with about 40 employees, is one of the smallest system builders with two offerings on the top 500 supercomputers list.

California Digital also is developing, under an exclusive license from Virginia Tech a software solution to address fault tolerance.

The software application, which California Digital calls Deja Vu, works to mask some node or component failures within a supercomputer deployment. Since the systems are routinely 1,000-plus nodes, the aim is to cut back on the degree of maintenance that large systems require.

"To keep these clusters up and running—that turns out to be a fairly difficult problem in computing," Chhabria said.