AWS Introduces VMs That 'Chew Through' Big Data

Amazon Web Services introduced on Tuesday a beefed-up virtual machine instance for running data-intensive workloads.

The new D2 instances offer dense, high-performance storage and fast networking for users churning through large amounts of data in the cloud, said Jeff Barr, chief evangelist for Amazon's cloud, in a blog post.

D2 makes sense for customers processing multi-terabyte data sets of the kind encountered when working with MapReduce and Hadoop clusters.

[Related: Amazon Web Services Launches Managed Service Provider Program With 6 Partners]

Sponsored post

For more rapidly accessing and processing warehoused data, D2 adds memory and computing speed to HS1, the first-generation of Amazon's high storage density and performance instances.

D2 comes in four tiers -- the lowest offering four virtual CPUs and 30.5 GBs of RAM, and the highest 36 virtual processors and 244 GBs of RAM. Moving up in tier, the instances take on higher network speed and disk throughput, with the top tier, d2.8xlarge, running on a 10-Gbps network with 3,500-MBs-per-second read speed and 3,199-MBs-per-second write speed when launched with a Linux Amazon Machine Image.

"The storage on this instance family is local, and has a lifetime equal to that of the instance. Therefore, you should think of these instances as building blocks that you can use to build a complete storage system," Barr said.

That means users should build redundancy into their storage architectures, use fault-tolerant file systems and back up the data on other Amazon storage services, such as S3 or Elastic Block Storage.

"You can also launch multiple D2 instances in a placement group for high-bandwidth, low-latency networking between the instances," Barr wrote.

Each D2 instance type is optimized for EBS with dedicated storage throughput from 500 Mbps to 4,000 Mbps.

When building large-scale, distributed applications, the biggest pain point is often I/O throughput, said Jamie Begin, CEO of RightBrain Networks in Ann Arbor, Mich., an inaugural AWS Managed Service Provider.

"Big Data has big storage requirements, and you can only sift through that data as fast as your application can ingest it," Begin told CRN.

"I see these new D2 instances being most interesting to our customers in life sciences or those doing a lot of marketing analytics," Begin said. "But one of the cool things about AWS is the ability to seamlessly swap out instance types without affecting application up time. We'll be investigating their performance across a wide variety of I/O-intensive use cases."