15 Big Data Technology Developments You Should Know About

Big Developments In Big Data

The big data industry is off to a fast start this year with many companies – both established vendors and startups – debuting new or upgraded software for business analytics, data management and other big data jobs.

While these new and updated products cover a broad range of big data technologies, some common themes run through them – primarily answering the challenges of how to simplify the chore of combining and preparing data for analytical tasks, and how to make data from widely diverse sources easily accessible to analysts, data scientists and other users.

Here's a look at 15 product rollouts that solution providers following the big data space should be aware of. Some, but not all, of these were unveiled at this week's Strata + Hadoop World conference in San Jose, Calif., or were being demonstrated there.

Arcadia Enterprise 4.0

Arcadia Data provides visual analytics software that's able to handle the most complex big data analysis problems. The software makes Apache Hadoop, cloud-based data lakes and other big data sources more accessible to a wide range of business users without the need to extract and move data.

Arcadia Enterprise 4.0 offers a new material design user interface based on Google Design, the UI design language created by Google in 2014. New proactive alerting and scheduling features support real-time data. And new secure extranet capabilities make it possible to publish data applications externally to thousands of users.

On the development front, new point-and-click rapid application features in the Arcadia Visual Designer help users define workflows and customize applications to meet company standards.

AtScale 5.0

AtScale's software provides a way for users of mainstream business analytics and visualization software, such as Tableau and QlikView, to tap into the huge volumes of data stored in big data systems like Hadoop.

AtScale 5.0 provides a new multidimensional calculation engine that supports the MDX query language for modeling complex business processes, a performance optimization engine that uses machine learning to optimize query performance, and a data abstraction layer that provides access to relational and other on-premise and cloud data sources. The 5.0 release also offers enterprise-grade security, data governance and metadata management capabilities.

AtScale was recently granted a patent on its platform's calculation engine that provides the links between data sources and data visualization tools.

Attunity Compose 3.0

Compose 3.0 is the latest release of Attunity's agile data warehouse automation software that helps businesses speed up analytics projects, optimize development and ETL (extract, transform and load) processes, and reduce risk.

The highlight of the 3.0 edition is a series of significant enhancements to the software's ETL capabilities ,resulting in a 10-fold increase in ETL processing speeds. New advanced DevOps processes (in development, testing, acceptance and production) streamline data warehouse design, development and rollout processes.

Also new is advanced version control, integrated with enterprise source control systems, for rolling projects back to earlier versions. And Compose 3.0 improves team collaboration and enables multi-user development projects by supporting concurrent development of models, mappings and data marts.

Cazena Data Science Sandbox As A Service

Cazena develops a cloud-based analytics platform for data science and data warehousing tasks. The service runs on Microsoft Azure and Amazon Web Services.

Cazena notes that data scientists are in high demand, but sometimes their work is hindered because their company lacks the DevOps resources or expertise needed for advanced analytics projects. The new Data Science Sandbox as a Service allows data scientists themselves to perform a wide range of analytics in a flexible cloud environment without having to build, manage or maintain the underlying technology.

The sandbox includes data storage, processing, security, tools and support for R, Python, SQL and other analytics languages. Data scientists can interact with the service through a web interface, other preferred applications or scripts, or built-in tools such as RStudio Server Pro or Hue Notebooks.

Couchbase Data Platform

NoSQL database developer Couchbase debuted a new release of the components of its data platform including Couchbase Server 4.6, Couchbase Sync Gateway 1.4, Couchbase Kafka Connector 3.0 and Couchbase Spark Connector 2.0.

Enhancements to the product suite enable the development of web, mobile and Internet of Things applications that can be deployed at mass scale.

New capabilities in Couchbase Server 4.6 provide easier global deployments, advanced security capabilities, integrated .Net application development, and built-in support for rich data structures including maps, lists and sets. Couchbase Sync Gateway 1.4 offers limitless scalability for mobile and IoT applications.

Dataguise DgSecure 6.0.5

Dataguise's new release of its DgSecure data governance software provides data monitoring and masking capabilities for sensitive data stored in Apache Hive data warehouse systems. Hive systems generally hold large datasets stored in Hadoop HDFS file systems.

Also available in the new edition is data monitoring capabilities for MapR, Teradata and Oracle database systems. It provides structured encryption and decryption for European languages. And an enhanced REST API enables multicloud service interoperability, in addition to on-premise functionality.

LucidWorks Fusion 3

LucidWorks Fusion is an application development platform for building search applications that data analysts and consumers use to index and search an organization's data. The system can scale to millions of users and billions of documents, according to the company.

The latest release, Fusion 3, provides a range of new capabilities that accelerate application development projects and processes. The Index Workbench helps teams manage and organize ETL processes, for example, while a new preview feature shows how different configurations will affect data collections before indexing.

The new release facilitates streamlined setup while the guided configuration tools provide faceting, field mapping and other transformations. The new Query Workbench makes it easier to adjust relevancy scoring and a new graphical administrative user interface simplifies development and deployment through a UI framework for rapid prototyping. Fusion 3 also includes full SQL compatibility.

Paxata Spring '17

Paxata markets the Paxata Adaptive Information Platform, self-service data preparation software that helps business analysts combine and prepare multistructured data from diverse sources for analysis.

The Paxata Spring '17 edition supports the Microsoft Azure cloud and a number of its services including HDInsight, Microsoft's Apache Hadoop system; Microsoft Azure Storage Blob for storing unstructured data; and Azure Data Lake Store.

The release also includes the new InterCloud Connect feature for data access and interchange between Azure and other cloud and/or on-premise systems.

Pentaho 7.0

Pentaho, a Hitachi Group company, has added machine-learning orchestration to its Pentaho Data Integration software that blends and prepares data for analytical tasks.

The new capabilities are part of the Pentaho 7.0 release.

The new orchestration capabilities streamline machine-learning workflows, according to the company, making it possible for data scientists, programmers and analysts to tune, test and deploy predictive modeling software. That, Pentaho said, helps remove bottlenecks in the predictive model development and deployment process.

Qubole Data Service

Big data-as-a-service provider Qubole now makes its Qubole Data Service (QDS) available on the Oracle Cloud. The service already runs on Amazon Web Services and Microsoft Azure.

QDS is an enterprise-grade platform that leverages such open-source processing engines as Spark, Hadoop and Hive for a range of big data workloads

QDS is natively integrated with Oracle Cloud, leveraging its bare-metal architecture for high performance. Subscribers can also utilize Oracle's NVMe SDD storage systems.

Reltio Cloud 2017.1

Reltio Cloud is a Platform-as-a-Service for operational data management, data-driven applications and large-scale analytical workloads.

The 2017.1 release adds new integration, collaboration and globalization capabilities. It embeds SnapLogic Enterprise Integration Cloud for loading and synchronizing data to and from the Reltio Cloud. It also adds a connector to access Dun & Bradstreet data.

On the globalization front, Reltio Cloud 2017.1 adds multidependent configurable lookups for easier configuration of dependent lookups based on multiple attributes such as country codes and organization types. Workflow and task management enhancements through a new personalized portal improve team collaboration.

SAP HANA Vora/Cloud Platform Big Data Services

SAP has released a new version of its HANA Vora in-memory computing engine that makes data in Hadoop systems more accessible. The new release allows time-series data to be stored and analyzed in distributed environments, supports graph processing, provides a distributed in-memory JSON document store, and supports Kerberos.

SAP also said it is expanding the availability of its SAP Cloud Platform Big Data Services to Europe and plans to launch SAP Vora on that service in both the U.S. and Europe by mid-2017. SAP Cloud Platform Big Data Services was formerly Altiscale, which SAP acquired in September 2016.

Splice Machine Cloud RDBMS

Splice Machine, developer of an open-source SQL relational database that works with Hadoop and Spark, will launch a relational Database-as-a-Service this spring on Amazon Web Services.

As with its current open-source RDBMS software for hybrid workloads, the new cloud RDBMS service will support both operational and analytical workloads without the need for separate data ETL tools. The cloud offering is designed to get customers up and running faster while removing the need for businesses to manage the database.

The service is currently being evaluated by early adopter customers and Splice Machine is targeting April for general availability.

Tableau 10.2

Tableau 10.2, the latest release of the vendor's popular data visualization software, became generally available at the beginning of March. Tableau is on a rapid product development and release schedule, having just launched Tableau 10.1 in November.

The new edition provides enhanced mapping capabilities, including a new spatial file connector, which makes it easier for businesses to use spatial data directly in Tableau. The software now connects to ESRI Shapefiles, KML, GeoJSON and MapInfo file types.

Tableau 10.2 offers new data preparation features, such as letting users join tables from a database, leveraging database structure and schemas more efficiently. The new release improves accessibility for users with disabilities by conforming to Web Content Accessibility Guidelines 2.0 AA.

The release improves data governance through fine-grained controls over guest access. Connectivity with SAP BW is enhanced with single sign-on support. And the product now has 60 instant data connectors with new connectors for Apache Drill and Microsoft SharePoint lists.

Talend Data Fabric Winter '17

Talend, the developer of cloud and big data integration software, recently made available the Winter '17 version of its Talend Data Fabric with new data preparation and cleansing functionality that makes it easier to access data in any source – including Hadoop, the cloud or traditional databases.

A new self-service data stewardship application helps users curate and manage data through its life cycle while adhering to data governance policies and requirements.

The Winter '17 release also incorporates machine learning to improve data quality and a pre-configured data dictionary to auto-recognize the meaning of raw data stored in a data lake. The new version also supports Spark 2.0.