Where's The Software To Catch Up To Multicore Computing?

In terms of available floating-point operations per second on processors and systems, Moore's Law hasn't yet reached its limits. But in terms of usable performance by most software--even advanced technical computing software--perhaps it already has.

A look at the Top500 Supercomputer Sites List (www.top500.org) shows that a large portion of the technical computing workload has moved to commodity Linux clusters: commodity servers, commodity networks and commodity storage. At the same time, novel multicore processor architectures, such as the Cell Broadband Engine (Cell BE), show the potential for substantial computing power (hundreds of gigaflops) to reside in entry-level servers, with, say, two to four processors.

With so much computing power so readily accessible--whether in systems-on-chip or commodity clusters--companies and industries of all sizes anywhere in the world, and perhaps even individuals, may be able to tap this power to solve more problems than ever before. There's only one problem: Where's the software to take advantage of all these processors, cores and threads? For the most part, it's not there yet--even in areas historically focused on leading-edge technology enablement, such as technical computing. In fact, IDC's Earl Joseph concluded in a study on technical computing software that "many ISV codes today scale only to 32 processors, and some of the most important ones for industry don't scale beyond four processors" (www. hpcwire. com/ hpc/ 43036§0. html).

His study also found that even when a vendor has a strategy to parallelize or scale its code, the cost of rearchitecting and recoding is too high relative to the perceived market benefits.

Sponsored post

Enter Roadrunner.

Roadrunner will be the world's first supercomputer based on the Cell BE. When it is up and running at Los Alamos National Laboratory in 2008, it will be capable of peak performance of more than 1.6 petaflops, or 1.6 thousand trillion calculations/s.

Roadrunner is the first rendering of a hybrid computing architecture: multiple heterogeneous cores with a multitier memory hierarchy. It's also built entirely out of commodity parts: AMD Opteron-based servers, Cell BE-based accelerators and Infiniband interconnect. Standard processing (e.g., file system I/O) will be handled by Opteron processors, while more mathematically and CPU-intensive elements will be directed to the Cell BE processors.

To make this complex architecture useful to even the most advanced scientific simulation application developers, much of the work on the system development is in the programming methodology enablement and corresponding application framework and tooling.

The application enablement application programming interfaces are simple but extensible, designed to take advantage of various types of memory and I/O subsystems while keeping changes in the underlying implementation hidden from the developer. The focus is also on enabling a set of core, efficient scatter/ gather memory operations of different topologies and to hide such things as computation and communication overlay from the developer.

The philosophy is a "division of labor" approach. There will continue to be a set of computational kernel developers maximizing performance out of the microprocessor ISA; in fact, many such kernels already exist (matrix multiply is a good example). Library developers will use frameworks such as the one developed for Roadrunner to synthesize the kernels into multicore, memory hierarchy libraries. Application developers will then link in those libraries using standard compiler and linker technology. Consistent APIs and methodology across a number of mutlicore architectures without the introduction of new languages will limit the cost of code maintenance. Thus, library developers get improved ease of use not just for accelerator systems but also for general-purpose multicore approaches and clusters.

Roadrunner is not just a single custom project for a national lab supercomputer; it represents a new architecture. We are inviting industry partners to define the components (APIs, tools, etc.) of the programming methodology so that the multicore systems are accessible to those partners as well. In this way, major scientific developments need no longer be limited to big universities or major research labs. The benefits of such focused industry enablement can "trickle down" to almost every aspect of our daily lives. Potential uses include:

• Financial services. By calculating cause and effect in capital markets in real-time, supercomputers can instantly predict the ripple effect of a stock market change throughout the markets.

• Digital animation. Massive supercomputing power will let movie makers create characters and scenarios so realistic that the line will be blurred between animated and live-action movies.

• Information-based medicine. Complex 3-D renderings of tissues and bone structures will happen in real-time, with in-line analytics used for tumor detection as well as comparison with historical data and real-time patient data. Synthesis of real-time patient data can be used to generate predictive alerts.

• Oil and gas production. Supercomputers are used to map out underground geographies, simulate reservoirs and analyze the data acquired visually by scientists in the field.

• Nanotechnology. Supercomputing is expected to advance the science of building devices, such as electronic circuits, from single atoms and molecules.

• Protein folding. Supercomputers can be used to provide an understanding of how diseases come about, how to test for them and how therapies and cures might be developed.

As architectures become more complex, from the multicore microprocessor to hybrid systems like Roadrunner; as supercomputing power becomes a commodity; and as developers still seek to get more performance out of their software without having to rely on the rate of "frequency bumps" that prevailed in the past, we are focused on keeping application development simple--forcing the art of the engineering into the framework enablement, not the application development. And by returning to a simpler way of doing things, we allow the software to catch up with advances in silicon, making teraflops on the desktop not just a feasible technical accomplishment, but a useful one as well.

Catherine Crawford ([email protected]) is chief architect for next-generation systems software at IBM Systems Group's Quasar Design Center.