Microsoft Research Project Could Displace Storage Nets
"At that time, we created an interface that turned out to be too low level because it forced developers to create their own storage allocation maps," said Michael Schroeder, assistant director of Microsoft's Mountain View research center that is focused on distributed computing.Schroeder worked on the same issue for ten years, including time in the 1990's while at the former Compaq Computer research center in Palo Alto. Boxwood takes a higher-level approach to the interface, using data abstractions rather than the kinds of logical or virtual disk abstractions used in previous projects. The new approach lets developers define the amount of storage they need at a level above block sizes and provide their own identifier for that space independent of actual storage address space. The method also helps the system do better load-balancing, data pre-fetching, and informed caching, according to a Web site on the project.
Boxwood uses variations of B-Tree data structures called B-Link trees. According to Microsoft, no researchers have published on distributed computing implementations of B-Link trees. The software also provides global storage- and state-locking mechanisms.
Microsoft has an early version of the Boxwood code running on a four-node PC cluster using SCSI disks and Gigabit Ethernet interconnects. The code has not been optimized for load balancing, reconfiguration or fault tolerance.
Researchers are now writing their first paper on the project and expect to prototype the software across a cluster of tens of systems within about a year. If successful, the project could be transferred into a future generation of Microsoft's storage virtualization products.
"We think this set of features will be very attractive to database, file system and other developers," said Schroeder.
With such storage virtualization capabilities, users could lash together inexpensive systems into useful commercial data centers — a fact that has significant repercussions for high-end server design. For example, with such software a relatively high-end cluster could be built up from $5,000 nodes configured from two fast CPUs, eight Gbytes of RAM, four 250-Gbyte serial ATA hard disks and two Gigabit Ethernet adapters. By combining 1,500 such systems, users could deliver a data center capable of 5 million SpecInt2000 performance with 1.5 Petabits of disk storage for less than $8 million.
Lower end configurations using server nodes costing as little as a few hundred dollars each are also possible, Schroeder added. "This is a lot cheaper than the current highly integrated systems. . . the extra dollars spent by engineers now developing server blades could be overkill," he added.
More significantly, by virtualizing the storage, such clusters eliminate the need for separate storage area networks that are relatively expensive because they require separate fabric switches, disk arrays and management software.
Storage virtualization is currently one of the holy grails for in next-generation data center software.
The Boxwood project aims to make it easier for software developers to handle storage tasks distributed across clusters of thousands of low cost, off-the-shelf servers, each with their own hard disk drives.
"This has been a tough nut to crack," said Schroeder.
This story courtesy of TechWeb .