These Servers Get Your Blood Flowing
Midwest System Builder's Linux Cluster Clocks IBM To Win University Deal
(URL: )

By Joseph F. Kovar, ChannelWeb


11:00 AM EST Wed. Nov. 22, 2006


Pig tissue or carbon? That's the question faced by researchers, doctors and patients when looking at replacing the valves in a diseased heart.

And researchers are now searching for the answer to that question with the help of a custom cluster of Advanced Micro Devices Opteron servers running Linux and tied together with proprietary high-speed networking gear, all of which was put together by Reason, a high-end system builder based in Burnsville, Minn.

That cluster resulted from a request for quote from the St. Anthony Falls Laboratory, a research arm of the University of Minnesota in Minneapolis, which wants to understand how the force generated by blood flow impacts the life of individual blood cells. It's a big question for such little cells, said Dr. Liang Ge, research associate at the laboratory, and important for learning how to build better heart-valve replacements.

While valves made from pig tissue need replacing about every 10 years, mechanical valves create more force and can damage blood cells. "We want to know, if you use a mechanical valve, what kind of force is applied to the blood cells? What is the effect from different valve designs? And how can we improve the design?" Ge said.

The scientists needed a system that could deliver a very high-resolution image, high enough to see how blood flow was affecting cells only 8 microns in diameter. Eight companies responded to the lab's RFQ, including such vendors as IBM and Dell.

In the end Reason won the bid thanks to its solution and its local presence, said Tom Morton, Reason's director of market development. "We were a little less costly than IBM, but the customer knew we were local and could come if needed at the drop of a hat," he said.

Reason certainly had the experience. In addition to building desktop PCs and servers, the 20-year-old company has been building PC clusters using a variety of open source software for the past five years.

It also helped that Reason had worked with the laboratory on custom storage and with the university on a Geowall project, building a workstation capable of displaying 3-D satellite images of the Earth's surface on a wall of 15 flat-panel displays. "We got wind of the project before it went to bid because we were already University of Minnesota contractors," Morton said.

For the blood-flow project, Reason developed a cluster of 54 compute nodes, each with two dual-core Opteron 275 processors running at 2.2GHz, giving the cluster a total of 216 processor cores. There is also a super node with two dual-core Opteron processors.

The Opteron was specified in the RFQ because at the time it had a significantly lower energy requirement compared to Intel processors, said Dominic Daninger, vice president of engineering at Reason. While that advantage has lately been less pronounced, the Opterons still have a significant advantage in terms of their on-chip memory controllers, Daninger said. "Opterons in such RFQs are favored over Intel 10 to one," he said.

The nodes also include motherboards from Supermicro featuring the industry-standard Intelligent Platform Management Interface to allow remote monitoring of node temperature, fan operations, system voltage and other environmental conditions.

About 5 Tbytes of SATA hard drives, a combination of 400-Gbyte and 500-Gbyte drives from Western Digital, are attached directly to the super node, Daninger said. They are controlled by a high-performance Areca RAID controller from Brea, Calif.-based Tekram Systems. One feature of the Areca that Daninger likes is its error lights that indicate which drive has failed.

The cluster nodes run on the CentOS Linux operating system, an open source derivative of Red Hat Enterprise Linux. Daninger said this helped keep costs low by avoiding having to pay license fees on all the nodes. "It works for a fairly savvy end-user like the University of Minnesota," he said.

To tie the cluster nodes together, Reason used Lawrence Berkeley National Laboratory's Warewulf Linux solution for managing Linux clusters. "It works great with diskless nodes and works hand-in-glove with CentOS," Daninger said.

Also included was the Sun Grid Engine for allocating processors to various jobs depending on priority. Reason also brought in Ganglia, an open source application that provides a graphical view of how busy the nodes are.

Reason initially built an eight-node test cluster, but found that eight nodes was the maximum when tied together with Ethernet. So Reason brought in Myrinet, a proprietary networking solution from Myricom, Arcadia, Calif.

Latency for each transaction using Myrinet is one-fifth that of Ethernet, Daninger said. "In these parallel environments, code is written so that one node can do its part of an operation and hand it off to another node," he said. "This all takes time."

Before installing the laboratory's application, Reason downloaded a fluid flow computational software application from NASA in order to benchmark the cluster for the laboratory. "Its benchmarks are well known in the fluid dynamic market," Daninger said. "It let us prove the cluster before the university added their software."

The configuration and deployment of the cluster fortunately went well because the contract called for it to be up and running 30 days from the day the purchase order was signed. "We delivered it on day 30," Daninger said. "My understanding is there was a time limit on the grant. If it was not done on time, they could lose the money. So there was some stress."

But there were minor problems, including the occasional bad driver and a few bad memory modules, as well as some driver issues with the Myrinet cards, and Daniger recalls a lot of late-night lunches. Ge, at least, had some fun with the memory modules, some of which caused a compute node to randomly crash. It took some time to realize it was a memory module problem.

"During that time, our IT guys and users would play a game to predict on which day the next crash would occur and on which node," he said. "I won both times."

The pressure was double for Reason, which was moving to a larger facility at the same time it was building the cluster. The move required installing enough power in the facility to test clusters the size of the one that was being built, Daninger said. "I told the electrician we needed power connects of 14 kilowatts for a computer system in the new office. He said, 'What the heck kind of a computer you putting in?' "

Dr. Fotis Sotiropoulos, director of the St. Anthony Falls Laboratory, said his laboratory does a wide range of projects related to fluid dynamics. In fact, the cluster built by Reason is also used to study the water flow in rivers and streams to aid in river restoration projects and to see how the flow of fish is affected by hydroelectric facilities, Sotiropoulos said. "Biology is just one part, but a big part," he said.

For scientific applications, there's never enough computing resources and more grants could mean adding more resources. That is what is likely to keep Reason and its competitors busy for a very long time.


Copyright 2009 Everything Channel