2018 Big Data 100: 45 Coolest Data Management And Integration Vendors
Business analysis and reporting tools may be the most visible component of big data systems – after all, it's how users interface with the data and analytical results.
But most anyone connected with big data projects will tell you that collecting, integrating, cleaning, transforming and managing data are the biggest challenges in most any big data initiative, making the software tools for these tasks a critical piece of the big data technology universe.
As part of the 2018 Big Data 100, we've put together a list of 45 companies that provide software for integrating, transforming and managing data.
Top Executive: President and CEO Greg Munves
1010 Data touts its Insights Platform as a fully integrated system with data acquisition and management, application development, analysis and modeling, reporting and visualization, and data sharing and monetization capabilities.
The New York-based company also provides a number of subject-specific analytical applications for such areas as market share, consumer behavior and investment insights.
Top Executive: President and CEO Rohit De Souza
Actian touts itself as the hybrid data management, analytics and integration company.
The company offers data management software including the Actian X hybrid database for operational analytics, Zen embedded database and the Actian NoSQL object database. On the data integration side is DataConnect for data integration, the DataCloud hybrid integration platform and PointConnect for application-to-application integration.
On April 12 Actian, based in Palo Alto, Calif., struck a deal to be acquired by HCL Technologies and Sumeru Equity Partners in a deal valued at $330 million. Actian will continue to operate as a separate entity within the HCL ecosystem.
Top Executive: CEO Satyen Sangani
Alation develops a data catalog system that automates the process of creating an inventory of an organization's data assets, helping chief data officers manage data as the valuable corporate asset it is.
The Alation Data Catalog also provides a way for analysts to search and query corporate data to develop insights for making faster and better decisions.
Alation is based in Redwood City, Calif.
Top Executive: CEO Dave Mariani
The AtScale Intelligence Platform makes it possible for business users to access data in a wide range of data sources, such as Hadoop and Spark, using popular business analytics software such as Tableau, Qlik and Microsoft Power BI. The AtScale software does this by creating a universal semantic layer that sits between the stored data and the business intelligence tools.
In March the San Mateo, Calif.-based company launched the AtScale Intelligence Platform 6.5 with expanded abilities to work with huge data lakes and the ability to migrate any analytical workload to public cloud systems.
Top Executive: CEO Shimon Alon
Attunity, based in Burlington, Mass., offers data integration and big data management software that accelerates data delivery and availability, automates data readiness for analysis, and optimizes data management.
Attunity's three core products: Replicate, Compose and Visibility, provide universal data ingest and replication, agile data warehouse design and deployment, and data usage and visibility, respectively.
Top Executive: CEO Taylor Barstow
In March Bedrock Data launched its Fusion software that unifies customer data from multiple Software-as-a-Service applications into a SQL data warehouse for analysis and provide a "360-degree view" of a customer.
Boston-based Bedrock Data's earlier product, Sync, synchronizes data between multiple applications, standardizing and de-duplicating customer data across multiple CRM, marketing and ecommerce systems.
Top Executive: CEO Jay Kreps
Confluent is helping businesses tackle the problem of working with streaming data by developing commercial software and services around Apache Kafka, the open-source platform for processing and analyzing streams of data in real time.
Confluent's software helps businesses maximize the value of live data in such industries as retail, financial services, manufacturing, media and logistics management.
Last year the Palo Alto, Calif., company debuted KSQL, an open-source streaming SQL engine that enables continuous, interactive queries on Kafka, allowing developers familiar with SQL to build applications that work with Kafka.
Top Executive: President and CEO Matt Cain
Couchbase markets one of the industry's leading next-generation NoSQL database systems, the Couchbase Data Platform, that's positioned as an alternative to relational databases such as the Oracle Database and Microsoft's SQL Server.
NoSQL proponents tout the databases as providing superior scalability, performance and agility over relational systems and as better able to handle semi-structured and unstructured data such as documents and social media posts.
The Mountain View, Calif.-based company has particularly targeted its software for use in digital transformation and customer engagement projects.
Couchbase launched its first channel program in January, seeking new reseller and consulting partners to work with its technology.
Top Executive: CEO Ali Ghodsi
Databricks markets its Spark-based Unified Analytics Platform that’s used by data scientists to tackle large-scale data analytics and artificial intelligence problems. The company was founded by the University of California at Berkeley team that created Apache Spark, the open-source processing engine that's become a core component of many big data initiatives.
In March Microsoft went live with Azure Databricks, a cloud service built in collaboration with Databricks targeting use for large-scale business analytics and artificial intelligence projects.
San Francisco-based Databricks raised an impressive $140 million in Series D financing in August.
Top Executive: CEO Billy Bosworth
The DataStax Enterprise distributed cloud database software, based on the open-source Apache Cassandra database, is positioned for use in hybrid cloud computing environments — just as more businesses shift more workloads to public and private cloud systems.
In April the Santa Clara, Calif.-based company launched DataStax Enterprise 6 with double the software's read and write performance and support for twice the throughput and number of users.
Top Executive: CEO Guy Churchward
DataTorrent is a player in the streaming data arena, marketing its DataTorrent Real-Time Streaming (RTS) big data platform, powered by the Apache Apex engine, for real-time analysis of data in motion.
In February the San Jose-based company debuted RTS 3.10, a new release that supports machine learning and offers capabilities that make it easier to analyze trends in real time.
Top Executive: President and CEO Chris Cook
The Delphix Dynamic Data Platform software is used by businesses and organizations to connect, virtualize, secure and manage data on premise and in the cloud for a broad range of "DataOps" tasks including application development projects, cloud migration/Data-as-a-Service projects and digital transformation initiatives. The system is also used for data governance, security and compliance tasks such as the General Data Protection Regulation requirements.
Earlier this year Delphix launched a significant expansion of its channel program, seeking reseller and systems integrator partners to work with its product.
Top Executive: CEO Ed Boyajian
EnterpriseDB sells a distribution of the open-source PostgreSQL database, the EDB Postgres Platform, for on-premise and cloud database applications. The company complements the database with its own software toolkits for system management, integration and migration projects, along with service and support.
EnterpriseDB, headquartered in Bedford, Mass., is a major contributor to the ongoing development of PostgreSQL through the PostgreSQL Global Development Group, which released PostgreSQL 10 in October.
Top Executive: CEO Brian Platz
Blockchain technology, originally developed for managing digital currencies like Bitcoin, is essentially a digital ledger that allows information to be distributed but not copied. It has been getting a lot of attention this year as businesses discover blockchain's potential use for a range of applications that require ensuring the integrity and security of transactional data.
FlureeDB, launched in November as a public beta, is a scalable blockchain cloud database that makes it easier for companies and developers who want to integrate blockchain technology into their existing IT infrastructure and business applications. It's seen as a key enabler as businesses increasingly run on decentralized applications.
Fluree PBC (public benefit corp.), the developer of FlureeDB, is the brainchild of Platinum Software founder CEO Flip Filipowski and SilkRoad Technology founder and CEO Brian Platz.
Top Executive: CEO Asaf Somekh
Big data startup Iguazio, founded in 2014 and based in Herzliya, Israel, officially launched its Continuous Data Platform last September. The unified data turnkey system ingests, enriches and analyzes data from a wide range of sources, simplifying the development and deployment of data-driven applications.
Earlier this year the company launched its inaugural channel program in a bid to globally recruit VAR, systems integrator and OEM partners to work with its unified data system.
Top Executive: CEO John Mracek
Imanis Data offers a hybrid cloud data management platform, which incorporates machine learning technology and is focused on data backup, archiving and recovery tasks. The San Jose company's technology works with data across on-premise and cloud systems including NoSQL databases, Hive and HBase.
In March the company named John Mracek, previously CEO at NetSeer, its new CEO and announced that it had raised $13.5 million in new financing.
Top Executive: CEO Matthew Carroll
Under the slogan of "accelerating the algorithm-driven enterprise," startup Immuta develops a data management platform that makes data "discoverable" without the need to physically move it or copy it. That allows data scientists to quickly access data for machine learning tasks and developing analytical models, and data governance professionals for constructing and enforcing complex data policies.
In March Immuta, based in College Park, Md., launched its inaugural channel program, seeking reseller, professional service, IT infrastructure and technology partners to help the company expand beyond its early adopter customers.
Top Executive: CEO Evan Kaplan
InfluxData has developed an extensive stack of open-source technologies that together address the challenging problem of managing the continuous flow of time-series data from Internet of Things networks and other systems.
The InfluxData platform offers a range of tools and services, including the InfluxDB time-series database, for real-time processing of time-series data in such areas as IoT, DevOps monitoring and real-time analytics.
San Francisco-based InfluxData has raised nearly $25 million in three rounds of venture funding. In August the company was named an advanced tier technology partner in the Amazon Web Services Partner Network.
Top Executive: CEO Anil Chakravarthy
Informatica is a long-time established company in the market for data ETL (extract, transform, load), integration and migration software and today is a major player in the Integration Platform-as-a-Service arena with its recently launched Informatica Intelligent Cloud Services software.
The company's product portfolio also includes its Enterprise Data Catalog, Informatica Data Quality, Enterprise Data Lake, Big Data Management, PowerCenter, Big Data Streaming and Multidomain MDM.
In April the company enhanced its data privacy and protection software and automated data governance software offerings with new artificial intelligence capabilities.
Top Executive: CEO Mark Kremer
JethroData develops a business intelligence SQL-on-Hadoop engine that accelerates interactive query performance for BI tools like Tableau and Qlik on big data.
In recent months JethroData, based in San Francisco, has partnered with BI tool developers Information Builders and MicroStrategy to enable their software to work with JethroData.
Top Executive: CEO Michael Howard
MariaDB is the company behind the open-source relational database of the same name that's an increasingly popular alternative to other database software such as MySQL.
The MariaDB database, a "fork" of the open-source MySQL product line, offers a range of advanced functionality including data replication and NoSQL capabilities. In May the company debuted MariaDB TX 2.0, a new release of the transactional edition of the database, and now offers MariaDB AX for data analytics and data warehouse tasks.
The company has headquarters in Espoo, Finland, and Menlo Park, Calif. Alibaba Group led a $27 million Series C funding round in November, bringing the company's total financing to $54 million.
Top Executive: President and CEO Gary Bloom
MarkLogic develops an enterprise-class NoSQL database that the company markets for integrating, storing, managing and searching data from multiple silos. The database offers advanced security, built-in search capabilities, a flexible data model and the vendor's Data Hub Framework that developers and architects use to create data flows from multiple source systems.
MarkLogic is based in San Carlos, Calif.
Top Executive: Managing Director Matthew Scullion
Matillion develops data ETL (extract, transform and load) and cloud data integration software specifically for Amazon Redshift, Google BigQuery and Snowflake Computing cloud data warehouse systems. The cloud-native data integration technology works with structured and semi-structured data, transforming big data into business insights.
Working in conjunction with cloud data warehouse services, Matillion's software provides an alternative to complex, expensive on-premises data warehouse systems.
Matillion is headquartered in Knutsford, Cheshire, U.K.
Top Executive: CEO Nikita Shamgunov
MemSQL develops an in-memory SQL database system specifically for real-time cloud and on-premises applications. The company says its high-performance software can handle both analytical and transaction processing workloads
The San Francisco-based company also offers MemSQL Cloud, a real-time data warehouse managed service.
In March MemSQL said its fourth-quarter bookings increased 200 percent year over year, leading to record fiscal year financial results.
Top Executive: President and CEO Dev Ittycheria
MongoDB develops an open-source NoSQL, document-oriented distributed database. Touting its performance and scalability, MongoDB competes with other NoSQL database software as an alternative to relational database systems.
The company also offers the MongoDB Atlas Database-as-a-Service.
In March New York-based MongoDB, which went public in October, reported that sales in its fiscal 2018 reached $154.5 million, up more than 52 percent from $101.4 million one year before.
Top Executive: CEO Derek Smith
Businesses are heavily investing in big data initiatives for operational and analytical purposes. But those projects may be doomed to failure if they are working with poor-quality data.
Naveego's cloud-based software provides data quality and master data management tools that help organizations monitor and manage the quality of their business data – whether on-premise or in the cloud – and leverage it for competitive advantage.
Naveego, founded in 2013 and based in Traverse City, Mich., launched its first channel program in October and is recruiting data management consultants, systems integrators and managed service providers.
Top Executive: CEO Emil Eifrem
Neo4j, formerly Neo Technology, is another player in the market for next-generation databases. It develops the Neo4j graph database, which the company touts for "connected data" tasks by recognizing persistent relationships and connections throughout massive datasets.
The company just released Neo4j 3.4. Related software also offered by the San Mateo, Calif.-based company includes the Cypher query language, Neo4j Graph analytics software, data discovery and visualization software, and ETL and data integration tools.
Top Executive: CEO Prakash Nanduri
Paxata develops software that empowers business users to transform raw data into insightful information, instantly and automatically.
The vendor's Adaptive Information Platform is an enterprise-grade, self-service data preparation application and machine-learning system that, according to the company, weaves data into an information fabric from any source and any cloud to create trusted insights.
In the fall of 2017 the Redwood City, Calif.-based company launched its Intelligent Ingest software, an addition to the Adaptive Information Platform that simplifies and automates the process of collecting data from any cloud and in any format for business analysis.
In November Paxata got a boost from systems integrator giant Accenture, which designated Paxata a strategic partner and acquired a minority stake in the company. Accenture is also adding Paxata's software to its own Accenture Insights Platform.
Top Executive: CEO Ash Munshi
Pepperdata markets DevOps technology for the big data arena, helping developers optimize code for big data Hadoop and Spark applications and clusters for maximum performance.
In March Pepperdata, based in Cupertino, Calif., launched Application Spotlight, a self-service portal that big data application developers use to generate application-specific recommendations to improve application performance, identify applications that need attention, flag performance bottlenecks, and issue alerts on failure conditions and resource usage.
In September the company launched a strategic partner program to provide support, training and resources to systems integration and service provider partners.
Top Executive: CEO Paul Barth
Businesses have been assembling data lakes, huge stores of generally raw, unorganized data – often built on Hadoop. The challenge is finding a way to tap into all that data's potential value.
Podium develops the Podium Data Marketplace, an enterprise data management platform for building centralized repositories of clean, well-documented data that's accessible to a broad range of users.
In 2017 Podium expanded its product lineup with Data Conductor, a toolset that helps data managers, compliance professionals and business users manage, discover and access all data on any platform within an enterprise. The Lowell, Mass.-based company also added "Intelligent Data Identification" to its platform, a tool that combines a smart data catalog with a pattern recognition engine to identify duplicate data, improve data governance and reveal potential data corruption problems.
Top Executive: CEO Ashish Thusoo
Qubole develops the Qubole Data Service, an autonomous "big data activation" Hadoop-based service that creates a single interface across data stored in multiple public cloud systems including AWS, Microsoft Azure and Oracle Cloud (connections to Google Cloud Compute Platform is under development).
In February Qubole, headquartered in Santa Clara, Calif., struck an alliance with cloud data warehouse provider Snowflake Computing, enabling the use of Apache Spark in Qubole with data stored in Snowflake.
Top Executive: CEO Ofer Bengal
Redis Labs markets Redis Enterprise, an in-memory NoSQL database for fast transaction processing and real-time analytics that boasts high performance, high availability and high scalability. The software is the commercial version of the open-source Redis database.
The company, headquartered in Mountain View, Calif., provides the database as packaged software and as a managed Database-as-a-Service in a private or public cloud environment. In March Redis Labs said its enterprise customer base had grown to 8,200.
Top Executive: CEO Manish Sood
Reltio initially focused its Reltio Cloud self-learning data Platform-as-a-Service on master data management and data quality tasks. But more recently the Redwood Shores, Calif.-based vendor has been expanding into new areas like data-driven applications, predictive analytics, artificial intelligence and machine learning.
Reltio Cloud 2018.1, launched in February, melded machine learning with advanced analytics – a move that comes as the dividing line between data used for operational applications and for analytical purposes becomes increasingly blurred.
Top Executive: CEO Bill McDermott
SAP provides a number of software products designed to manage and make use of big data, starting with its HANA in-memory database system that is also a platform for much of the company's application portfolio. While serving up data for SAP applications, HANA also performs underlying advanced and predictive analysis tasks, text analytics and search, ETL functions and more.
SAP, based in Waldorf, Germany, also sells multiple business intelligence and analytics products including SAP Vora, SAP Business Warehouse/4HANA, SAP Leonardo Analytics, SAP Predictive Analytics, SAP Cloud Analytics and the older BusinessObjects and Lumira products.
Top Executive: CEO Gaurav Dhillon
SnapLogic markets the SnapLogic Enterprise Integration Cloud, a Platform-as-a-Service that businesses and organizations use to connect on-premise and cloud data and applications. The company also provides "Snaps," pre-configured connectors for ERP, CRM and other applications and for big data systems including relational and NoSQL databases and Internet of Things systems.
This month the San Mateo, Calif.-based company debuted SnapLogic eXtreme, a system that supports complex data processes on cloud big data services such Amazon Elastic MapReduce, Microsoft Azure HDInsight and Google Cloud Dataproc.
Top Executive: CEO Monte Zweben
Splice Machine's core product is an open-source SQL relational Database-as-a-Service, powered by Apache Hadoop and Apache Spark, with data warehouse and machine learning capabilities.
In December Splice Machine, headquartered in San Francisco, unveiled its new Online Predictive Processing Platform for running predictive analytics for real-time operational applications.
At the same time the company raised an additional $9 million in financing, bringing its funding total to $40 million.
Top Executive: CEO Girish Pancha
StreamSets offers a data operations platform and related products for managing the life-cycle of "data in motion" or, as the company puts it, "air traffic control for your data."
The StreamSets Data Operations Platform is the core of the company's product line for building, executing, operating and protecting dataflows. Additional software includes Data Collector and Data Collector Edge, Control Hub for managing dataflow architectures, and Dataflow Performance Manager for operating dataflow pipelines.
Top Executive: President and CEO Ali Kutay
The Striim platform, an end-to-end streaming data integration and intelligence system, makes it possible to integrate, analyze and visualize streaming data from big data networks, cloud systems and Internet of Things devices.
The latest release of the Stream software, out in April, bolstered the product's streaming data integration and hybrid cloud capabilities, including real-time data ingestion and stream processing for Apache Kudu, a column-oriented data store within Hadoop.
Striim is based in Palo Alto, Calif.
Top Executive: CEO Josh Rogers
Syncsort offers a broad lineup of data management and data quality software tools that the company says spans "big iron to big data."
The Pearl River, N.Y. company's product lineup, for example, includes Hadoop ETL and Data Warehouse Offload to Hadoop, ETL tools for Amazon cloud systems, data migration tools for IBM mainframes and Power Systems, and ETL tools for Windows, Unix and Linux systems.
Top Executive: CEO Mike Tuchen
Under the umbrella of the Talend Data Fabric, Talend markets a line of data management, preparation and integration tools. At the core is the Spark-based Big Data Platform for connecting cloud and on-premise data.
Other products cover data preparation, metadata management, real-time data management and real-time application integration.
In March Talend expanded the support services in its global partner program, including stepping up investments in partner training, services and market development assistance – including pre-sales and sales enablement support.
Top Executive: CEO Andy Palmer
Tamr's Enterprise Data Unification data source connectivity software uses machine learning technology to automate the process of curating, unifying and enriching data across multiple data sources for business analytics tasks.
Based in Boston, Tamr was founded in 2013 by Andy Palmer, Vertica's founding CEO, and database technology notable Michael Stonebraker.
Top Executive: CEO Adam Wilson
Trifacta develops "data wrangling" software used to discover and prepare raw data for business analytics tasks. Trifacta Wrangler Enterprise gives data analyst teams the self-service capability to explore and transform data while centralizing data security and governance.
In January Trifacta, headquartered in San Francisco, raised $48 million in additional financing, bringing its total funding to $124 million.
Top Executive: President and CEO David Flower
VoltDB has developed a "translytical" operational database for applications where data volumes are big and data accuracy is critical. The relational in-memory database is designed to process millions of data transactions per second and perform real-time analytics where business-critical transactions require immediate decisions.
In February VoltDB, based in Bedford, Mass., released the v8 edition of the company's database with more predictable, long-tail latency responses based on real-time data and historical intelligence, improving real-time processing and self-service analysis.
Top Executive: CEO Alex Gorelik
Waterline's Smart Data Catalog uses machine learning to discover, manage and govern enterprise data at scale. The software is used by chief data officers, analysts and data stewards for self-service data analytics, data governance and data rationalization/optimization tasks.
In March the company unveiled the Waterline Metadata Discovery Platform, which uses data virtualization technology to accelerate big data discovery and governance, and the use of big data in such applications as compliance and data cataloging for analytics .
Top Executive: CEO Ben Sharma
The Zaloni Data Platform provides data management, cataloging, governance and self-service capabilities that help businesses and organizations operationalize the huge volumes of data stored in data lakes.
Zaloni recently extended its data platform with a new machine-learning data matching engine that creates what the company calls "golden" records that enable enriched data views for multiple use cases, including customer-facing applications.