12 Big Data Developments You Should Know About

Big Doings in Big Data

It's been a busy couple of weeks in the big data space with both startups and established companies debuting new products, delivering updates and enhancements to existing product lines, and forming strategic relationships.

Industry observers say many enterprises are taking their pilot-stage big data systems, particularly those incorporating the Hadoop platform, and putting them into broader production. Much of the news covered here is aimed at helping businesses make that transition.

So here's a look at a dozen big data-related announcements that caught our attention. Many – but not all – came out of this week's Hadoop Summit 2016 conference in San Jose, Calif.

Actian Releases VectorH 5.0 Database

Actian this week debuted Actian Vector in Hadoop (VectorH) 5.0, a new release of the company's SQL-in-Hadoop database that is now integrated with Apache Spark.

The Spark integration makes it possible to ingest data from different sources and in different forms, and allows developers to build high-performance streaming, ETL and machine-learning applications against VectorH. Such capabilities are critical as organizations move Hadoop analytics systems into production.

Actian is also touting the improved performance of VectorH, saying it outperforms competing technologies like Apache Hive, Cloudera Impala, Apache Spark SQL and Apache HAWQ, citing TPC-H query benchmark tests.

(Actian CMO Tony Kavanaugh, on the left in photo, talks with Dave Sugarman, Actian partner sales vice president, at this week's Hadoop Summit 2016.)

AtScale's BI-on-Hadoop Software To Be Resold by Hortonworks

AtScale develops the AtScale Intelligence Platform, which allows commonly used business analytics tools to access data stored in Hadoop clusters.

At the Hadoop Summit, Hadoop software vendor Hortonworks said that starting in the third quarter it will resell AtScale's software as part of its packaged software offerings around its Hortonworks Data Platform.

AtScale CEO Dave Mariani said that given Hortonworks' market presence, the relationship would provide AtScale with a significant sales channel for its software. "We think there's a lot of upside here to drive new business," he told CRN.

Attunity Visibility For Hadoop

Attunity, a developer of big data management software, launched the latest edition of its Visibility for Hadoop system that helps administrators answer such questions as who is accessing Hadoop data, how are they using it and what kinds of IT resources are being consumed in the process. Those answers can be critical for managing data growth and workload performance within Hadoop-based data lake production environments.

The new release provides more comprehensive analytics that helps organizations measure Hadoop data and storage usage for more accurate capacity planning, optimizing cost performance and meeting data governance and compliance requirements. The new capabilities extend across storage levels in the Hadoop File System, as well as Hadoop data processing engines including MapReduce, Tez, Hive and Cloudera Impala.

Dataguise DgSecure 6.0

Dataguise launched a new release of its DgSecure software, a data-centric security platform that data managers and chief information security officers use to manage sensitive data in traditional relational databases and big data platforms.

DgSecure 6.0 is compatible with a broad range of data platforms and data sources and supports structured, semi-structured and unstructured data in both on-premise and cloud systems. It provides data detection, protection and monitoring capabilities for data governance. And it works with a wider range of IT and data management frameworks.

The new release also simplifies the creation of data security governance policies using built-in and custom templates.

Hortonworks Previews HDP 2.5

Hortonworks will ship the next release of its flagship Hortonworks Data Platform software in the third quarter, in keeping with the "rapid release" schedule for the Apache Hadoop-based software the company committed to earlier this year.

HDP 2.5 adds to the system's security and governance capabilities through support for Apache Ranger and Apache Atlas, respectively. Also supported is Apache Zeppelin, a Web-based notebook for creating interactive analysis documents with SQL, Scala, Python and other development languages. And HDP 2.5 is integrated with the latest release of Apache Ambari for planning, installing and configuring Hadoop systems.

Hortonworks also unveiled an expansion of its Partnerworks partner program, adding initiatives aimed at recruiting managed service providers and serving independent software vendor and independent hardware vendor partners.

Koverse 2.0 Speeds Data Lake Implementations

Startup Koverse provides a "data-lake-in-a-box" platform that the company says makes it possible to collect big data and put it into production much more quickly and at lower cost than with current technologies and practices.

The company, founded in 2012, developed an early version of its technology more than two years ago. The Koverse Platform 2.0, unveiled June 21, incorporates the Apache Accumulo "distributed key/value store" technology and the company's Universal Indexing Engine.

Co-founders Paul Brown (chief product officer) and Aaron Cordova (chief technology officer) worked as data scientists at the National Security Agency where they helped develop the original Accumulo project and re-architected that organization's data infrastructure to better handle unanticipated data analytical situations. They're bringing that expertise to Koverse's customers.

Looker Updates Embedded Analytics Toolset

The Looker business intelligence platform provides access to data that resides either in a database or in the cloud. The company's Powered by Looker tools make it possible for developers to embed those capabilities within applications or build custom applications with data exploration and analytics capabilities.

This week the company updated Powered by Looker with pre-built application modules and libraries of pre-built web page and application widgets, enhanced APIs, and new capabilities such as support for Javascript dialogue for easier embedding of Looker within applications.

Pepperdata Hadoop Health Check

Pepperdata develops software for managing and improving the performance of Hadoop clusters. The vendor is rolling out its Hadoop Health Check program under which the company uses its software to perform a complimentary assessment of Hadoop clusters of 100 nodes or more. The software collects and analyzes Hadoop performance data and the company develops a diagnostic report pinpointing problem users or jobs that consume too many IT resources, identifies underutilized cluster capacity, and flags processing bottlenecks.

Talend Releases Updated Big Data Integration Platform

Talend debuted a new edition of its Talend Data Fabric platform for integrating data and applications that reside on-premise or in the cloud.

A new release of the Talend Data Preparation software, part of the Data Fabric system, provides expanded self-service data preparation capabilities to a broad range of business users, rather than limiting them to a small number of data-savvy users as many data preparation tools do. The company said the toolset offers more intuitive data preparation capabilities combined with role-based access to shared data stores.

New features in Talend Data Mapper helps businesses better operationalize corporate data lakes through easier manipulation of massive data sets to identify patterns in the data and identify new business opportunities. Also new in Talend Data Fabric is Secure Socket Layer communication between data integration jobs in Talend Integration Cloud and Amazon Redshift.

Teradata Wins Support for Presto

Enterprise data warehouse vendor Teradata unveiled support for its distribution of the Presto SQL-on-Hadoop software from a number of big data analytics software vendors.

Presto, originally developed by Facebook, is an open-source distributed query engine that can run interactive queries against varied data sources including Apache Hive, Apache Cassandra, the Hadoop Distributed File System, relational databases and even proprietary data stores. Teradata's Presto distribution is part of the vendor's Teradata Unified Data Architecture.

Software developers who support Teradata's Presto include Tableau, Looker, Information Builders, Qlik and Zoomdata, with MicroStrategy and Microsoft working to certify their business intelligence tools to work with the software.

Waterline Data Software Supports Apache Atlas

Waterline Data develops its namesake Smart Data Catalog software that inventories data lake assets, improving data discovery and making it easier for businesses to derive value from those assets.

The Smart Data Catalog is now integrated with Apache Atlas, the open-source data governance technology, within the Hortonworks Data Platform. With the Waterline Smart Data Catalog, Apache Atlas users can replace manual metadata tagging with automated processes to classify data lake assets and improve data governance.

Zoomdata's Visual Analytics Software Supports MapR, Apache Drill

Zoomdata provides a big data visual analytics platform that's capable of handling large, complex queries in real time in both on-premise and cloud environments.

Zoomdata's software is now certified to work with the Hadoop-based MapR Converged Data Platform and the two companies are collaborating on enhanced product integration and support.

As part of that collaboration, Zoomdata said it has developed a "smart connector" that natively links its software to Apache Drill, the open-source SQL query engine for accessing data in a wide variety of NoSQL databases and file systems.