13 Big Data Products To Check Out At Hadoop Summit 20139:00 AM EST Wed. Jun. 26, 2013
Big data is one of the most active areas in information technology right now. And there's no better place to catch up on what's happening in the big data universe than the Hadoop Summit that starts Wednesday in San Jose.
More than 60 of the big data market players, from established vendors like Intel and Salesforce.com to just-out-of-the-gate startups like Sqrrl and Platfora, will be demonstrating what they have to offer.
Here's a look at a baker's dozen of new and enhanced products being shown at the conference.
Continuuity is launching the Continuuity Developer Suite 1.7 that supports batch processing, integrating MapReduce into the platform to provide developers with a broader set of workload capabilities.
Continuuity helps Java developers build applications that run with Hadoop and its HBase database. Those applications support real-time applications such as operational analytics. But CEO Jon Gray said some applications still require the batch-processing architecture of MapReduce.
Continuuity Developer Suite 1.7 also offers a number of built-out application templates for streaming real-time analytics, targeting and personalization, and anomaly detection.
Datameer will be demonstrating Datameer 3.0, a new release of the company's data integration and analytics software for business users. The release adds "smart analytic" functions that automatically identify patterns and relationships within huge volumes of complex data stored in Hadoop.
Datameer 3.0 uses four machine-learning techniques: clustering, decision trees, column dependencies and recommendations. While these have traditionally been the domain of skilled data scientists, they are incorporated in the Datameer software such that business users can use them on a self-service basis, according to the company.
Datameer 3.0 will be in beta testing for several months before becoming generally available.
Hortonworks will be unveiling the community preview of the next release of its Hortonworks Data Platform that supports Yarn, the next-generation Hadoop data-processing framework from the Apache Software Foundation (ASF) for running distributed applications.
Yarn, part of the ASF's Hadoop project, is designed to enable multiple use cases against a single data set. Including Yarn in the HDP 2.0 community preview will allow Hortonworks' partners and customers to begin working with the new technology and participate in developing its final specifications, said Hortonworks marketing vice president Dave McJannet.
Kognitio is debuting a new release of the Kognitio Analytic Platform with greater connectivity across programming languages and improved performance. Version 8 of the software offers NoSQL processing combined with massively parallel processing execution of any script or binary code such as R, Python or Java.
Performance benchmark tests conducted using Version 8 showed it returned answers to complex queries at twice the speed of the previous release.
Version 8 also offers high-speed data export for fast data backup and in-memory compression added as an optional feature.
MapR and Fusion-io will demonstrate significant performance gains by combining the Hadoop-based MapR M7 big data platform with the Fusion ioMemory system when running read-intensive HBase applications.
HBase application performance is often limited by disk storage bottlenecks, according to MapR. Using Fusion ioMemory in conjunction with the MapR system improved performance by a factor of 25, according to the company.
Adoption of the HBase open-source database for high-performance computing tasks has been slowed by I/O performance limitations.
Business analytics application developer Pentaho is unveiling what it calls the "adaptive big data layer" in its software that offers integration capabilities with big data stores.
The new technology links Pentaho to Hadoop distributions from Cloudera, Hortonworks, MapR Technologies and Intel, as well as NoSQL databases Cassandra and MongoDB. It also supports the Splunk engine for machine data.
RainStor will unveil a major update to its database software with new security features the company said should boost the adoption rate of Hadoop among security-sensitive customers in government, banks and telecommunications companies.
New security capabilities in the RainStor database, which runs natively on Hadoop, include data encryption, data masking and views, audit trail and tamper-proofing, configurable data disposition, and support for Kerberos, LDAP, Active Directory and PAM (Linux's Pluggable Authentication Modules).
The new search capabilities boost the database's query performance by a factor of 10 to 100, the company said, and enable faster text search. The database can now search across billions of records on a multi-petabyte scale, Rainstor said.
Splunk, best known for its real-time operational intelligence software, is unveiling a beta version of its new Hunk: Splunk Analytics for Hadoop.
Hunk integrates tools for exploration, analysis and visualization of data stored in Hadoop. It uses the company's virtual index technology for data analysis and provides tools for building charts, graphs, custom dashboards and reports.
The software works with leading Hadoop distributions from Cloudera, Hortonworks and MapR Technologies.
Startup Sqrrl is now shipping Sqrrl Enterprise 1.1, a secure, scalable platform for developing real-time analytical applications. With the 1.1 release, Sqrrl is moving from the limited release phase the software has been in to general availability.
The Accumulo technology was originally developed by the National Security Agency and was spun out as an open-source project in 2011.
Teradata is announcing the Teradata Portfolio for Hadoop, a collection of hardware platforms, software, consulting services, training and customer support for deploying and managing Apache Hadoop.
The offerings include the choice of "premium platforms" Teradata Appliance for Hadoop and Teradata Aster Big Analytics Appliance. The former comes loaded with Hortonworks' distribution of Hadoop, Mellanox Infiniband hardware and Teradata's BYNET V5 software. The latter includes the Aster database, SQL-MapReduce and Apache Hadoop.
The company also is offering the Teradata Commodity Configuration for Hadoop for businesses that want to deploy Hadoop on standard servers from Dell. And the Teradata Software Only for Hadoop platforms is a software bundle for customers who want to source and configure their own hardware.
VMware is introducing a public beta release of VMware vSphere Big Data Extensions, a new feature that extends the company's virtualization platform to support Apache Hadoop and big data processes.
With the new software, businesses can deploy, run and manage Apache Hadoop clusters alongside other applications on a common virtual infrastructure. That brings the benefits of virtualization, including scalability, performance and elasticity, to Hadoop systems, said Fausto Ibarra, VMware senior director of product management.
VMware vSphere Big Data Extensions built off of VMware's Project Serengeti, the open source project the company launched last year to enable the development and deployment of Apache Hadoop clusters on virtual infrastructure. VMware vSphere Big Data Extensions is expected to be generally available by the end of the year.
WANdisco (for Wide Area Networking distributed computing) will be unveiling WANdisco Non-Stop NameNode – WAN Edition, a new replication technology that the company said brings 100-percent uptime for globally distributed big data systems based on the Hadoop platform. The company already provides a LAN Edition of the software.
WANdisco also will show a new release of WANdisco Distro (WDD 3.6), based on Apache Hadoop 2.0, the company said will support migration from Amazon Web Services to private clouds. The company will also open-source its implementation of the S3 API on Hadoop, allowing businesses to use their custom applications that write S3 against their own in-house Hadoop with S3HDFS. The company is also expected to announce support for the Shark real-time analytics and Spark in-memory data processing technology as add-ons for WANdisco Distro 3.6.
Zettaset's Orchestrator Hadoop cluster management software now supports Hadoop distributions from Cloudera and Hortonworks. Owners of Cloudera CDH and Hortonworks HDP can use Orchestrator to automate the installation and administration of their Hadoop infrastructure.
The complexity of installing and managing Hadoop clusters has hindered the adoption of Hadoop, said Zettaset Co-founder and CTO Brian Christian. Orchestrator helps eliminate manual configuration processes, reduces Hadoop complexity and brings to Hadoop enterprise-class manageability, security and availability.