13 Hot Big Data Products From This Week's Hadoop World

What's Hot In Big Data?

The 2013 O'Reilly Strata + Hadoop World conference is under way in New York, where just about every major player in the big data arena is showing off their latest technology for collecting, managing and analyzing big data.

Here's a look at some of the new products either being announced at the show or that were recently unveiled and are being demonstrated there.

0xdata Debuts New Release Of Predictive Analytics Software

0xdata is demonstrating the second generation of its H2O machine learning and predictive analytics engine for Hadoop, R and Excel. 0xdata said the new software, known as the "Fluid Vector" release, performs parallel and distributed advanced algorithms on big data at speeds up to 100 times faster than competing predictive analytics tools.

H2O runs in-memory and uses the Hadoop Distributed File System for its storage platform. Users can pull in data from Microsoft Excel, the RStudio developer environment, SQL, NoSQL, S3 or HDFS using a REST API.

0xdata's goal is to bring advanced predictive analysis capabilities to a broader audience of users, said CEO SriSatish Ambati.

ClearStory Data Unveils Data Intelligence Software

Startup ClearStory Data is launching its new platform and application software for big data analysis and exploration. The company's Data Intelligence software is designed to make it easier to access internal and external data sources, including corporate databases, Hadoop and the Internet, and use that data to uncover trends and patterns.

The software, which has visual discovery capabilities, is designed for people with a range of skill levels across all kinds of functions throughout business organizations. The overall goal is to bring big data analysis to a broader range of information workers.

Cloudera Launches Fifth Generation Of Its Big Data Platform

Cloudera is offering public beta releases of Cloudera Enterprise 5, a fifth-generation release of the company's big data platform, and CDH 5, a new release of the company's Hadoop distribution. Both incorporate Apache Hadoop 2, the latest release of the open-source Hadoop software.

Key enhancements to the new Cloudera Enterprise 5 include unified management of third-party applications, the ability to cache data sets from the Hadoop Distributed File System (HDFS) in-memory, and improved resource management for running multiple frameworks for data processing and analysis on a single cluster.

The software offers several new capabilities for managing and exploring big data. And data protection is improved through HDFS and HBase support of snapshots to prevent data loss.

Dataguise Showcases DG For Hadoop 4.4

Dataguise is showing a new release of its big data security and privacy protection software at the Strata + Hadoop World show. The software helps businesses leverage big data while remaining compliant with privacy data regulations.

DG for Hadoop 4.4 includes new capabilities to help businesses evaluate data exposure and enforces what the company calls "the most appropriate remediation" to protect companies from financial and brand damage. The software is certified to work with major Hadoop technologies such as Cloudera, Hortonworks and MapR Technologies.

Hortonworks Launches Data Platform 2.0

Hortonworks is showing off its Hortonworks Data Platform (HDP) 2.0, the new release of the company's commercial distribution of the Hadoop big data platform. HDP 2.0 is built on the recent Hadoop 2 release from the Apache Software Foundation (ASF).

A key enhancement to the new Hortonworks release is inclusion of YARN (Yet Another Resource Negotiator), a new Hadoop technology that allows developers to use programming frameworks other than MapReduce. Also in the new Hortonworks release is technology from the ASF's Stinger initiative that improves the speed and scale of SQL semantics support by Apache Hive.

Informatica Debuts Streaming Data Collection Technology

Informatica is demonstrating Vibe Data Stream for Machine Data, the vendor's new software for simplifying the collection of streaming high-velocity, high-volume machine data from multiple sources and delivering it to big data platforms like Hadoop and Cassandra.

Informatica Vibe Data Stream for Machine Data is a new component of Informatica's data integration platform. The software uses embeddable Vibe agents to collect data and stream millions of records per second for realtime event processing and analytical applications.

Microsoft Ships Windows Azure HDInsight

Windows Azure HDInsight, Microsoft's cloud-based distribution of Hadoop, is now generally available on the company's Windows Azure cloud platform, Microsoft said at the show. In a blog post, Quentin Clark, Microsoft corporate vice president of the Data Platform Group, called the availability of the software a milestone in Microsoft's strategy to "bring big data to a billion people."

By offering HDInsight as an Azure cloud service, Clark said Microsoft is providing the benefits of open-source Hadoop with the security and management capabilities most businesses require. The service is integrated with Excel and Power BI (the business intelligence component of Office 365) and supports .NET, Java and other programming languages.

MicroStrategy Shows Off New Desktop Analytics Software

MicroStrategy will be demonstrating the recently unveiled MicroStrategy Analytics Desktop, a self-service data visualization tool that the vendor is offering as a free download.

With the desktop software users can access data and develop visualizations such as an earthquake density map. They can also build dashboards, and then export and send dashboards as images, PDF files or a full MicroStrategy Analytics Desktop file.

The company also will be demonstrating the MicroStrategy Analytics Platform, an upgrade of the vendor's core analytics software with new data connectors for MongoDB, the Hortonworks data platform 1.3 and Intel's Hadoop distribution. It also offers new cloud deployment options and performance enhancements for analyzing data in-memory.

RainStor Database Wins EMC Isilon Validation

RainStor has completed validation testing of its database software on EMC's Isilon scale-out, network-attached storage system running the Hadoop Distributed File System. Running Hadoop with RainStor on Isilon creates a flexible architecture for running Hadoop on DAS (direct-attached storage) and NAS (network-attached storage) systems, speeds up query performance and improves data security.

With more businesses running Hadoop on Isilon, RainStor said the validation offers customers more flexible deployment options for working with big data. As data volumes grow, balancing CPU and storage capacity becomes increasingly critical.

Revolution Analytics Debuts R Enterprise 7

Revolution Analytics unveiled the next generation of its data analytics platform powered by the R programming language. Revolution R Enterprise 7 (RRE 7) features a new "write once, deploy anywhere" function that allows businesses to utilize a variety of data management platforms such as Hadoop and second-generation data warehouses. That marks the first time RRE can operate directly within a Hadoop environment.

The new release also provides a library of prebuilt, parallelized versions of common statistical and predictive analytical algorithms that are within the Hortonworks and Cloudera Hadoop platforms and Teradata database. That allows data scientists to build predictive models, such as the regression tree (shown), inside Hadoop or Teradata databases without the need to move big data files to a separate server. RRE 7 is also now integrated with Alteryx strategic analytics software.

Skytree Extends Machine Learning Environment To Any Hadoop Environment

Skytree will be demonstrating how its Skytree Server for machine learning and predictive analysis is now integrated with Apache Hadoop. That move combines predictive analytics with unified management on any Hadoop environment. The company has partnerships with Hadoop technology providers Cloudera, Hortonworks and MapR Technologies.

Skytree Co-Founder and CEO Martin Hack said the company's goal is to disrupt the advanced analytics software market with its software to discover deep insights, predict trends, make recommendations, and uncover new markets and customers.

Splice Machine Takes Next Step With Its SQL-On-Hadoop Database

Splice Machine, which has been developing what it calls the industry's only realtime, SQL-on-Hadoop database, has launched a limited release program for the new software. The company is seeking 50 evaluators to try out the technology, including validating specific use cases and testing SQL coverage and benchmark performance, before releasing the product for general availability.

Splice Machine is developing the database as an alternative to traditional relational databases, such as Oracle and IBM DB2, for realtime, transactional big data applications. "We are looking to power applications that not only analyze data, but act out that analysis," founder and CEO Monte Zweben said in an interview.

Splunk Shows Off Enterprise 6 Machine Data Platform

Splunk is demonstrating the Splunk Enterprise 6 realtime operational intelligence platform for machine data, which became generally available earlier this month.

The release includes enhancements that speed up the software's analytics. The new Pivot technology (see screenshot at left) and its drag-and-drop interface bring data analysis and visualization capabilities to nontechnical business users and analysts.

Splunk Enterprise 6 also sports new data models to represent underlying machine data and relationships between the data, and a new high-performance analytics store the company said delivers analytics performance improvements up to 1,000 times faster than earlier releases.

The company also is demonstrating its new Hunk: Splunk Analytics for Hadoop software.