2018 Big Data 100: Coolest Emerging Big Data Vendors

The Ones To Watch

Big Data technology is one of the most dynamic segments of the IT industry, with a steady stream of startups entering the market with leading-edge technology that tackles the challenges of collecting, curating, cleaning, transforming, managing and analyzing ever-growing volumes of data.

As part of the sixth annual Big Data 100, we've included put a list of the innovative companies started in 2012 or later that have demonstrated an ability to innovate in bringing to market products and service that help businesses work with big data.


Top Executive: CEO Satyen Sangani

Alation develops a data catalog system that automates the process of creating an inventory of an organization's data assets, helping chief data officers manage data as the valuable corporate asset it is.

The Alation Data Catalog also provides a way for analysts to search and query corporate data to develop insights for making faster and better decisions.

Based in Redwood City, Calif., Alation was founded in 2012.


Top Executive: David Drai

Anodot's real-time analytics and automated anomaly detection system detects outliers in time series data and turns it into business insights. The software's machine learning technology analyzes business data to identify business incidents, helping managers remedy problems and take advantage of opportunities.

In December Anadot raised $23 million in Series B financing and announced that the company tripled its revenue in the past year.

Based in Ra'anana, Israel, Anodot was founded in 2014.

Arcadia Data

Top Executive: CEO Sushil Thomas

Arcadia Data provides unified visual analytics and business intelligence software that can access and analyze large volumes of data in such systems as Hadoop, data lakes and in the cloud without the need to move it to a data warehouse.

A new release of Arcadia Enterprise in March offered greater scalability in data volumes and concurrent users, the ability to handle a wider variety of data formats, and new self-service and data governance capabilities.

Based in San Mateo, Calif., Arcadia data was founded in 2012.


Top Executive: CEO Dave Mariani

The AtScale Intelligence Platform makes it possible for business users to access data in a wide range of data sources, such as Hadoop and Spark, using popular business analytics software such as Tableau, Qlik and Microsoft Power BI. The AtScale software does this by creating a universal semantic layer that sits between the stored data and the business intelligence tools.

In March the company launched the AtScale Intelligence Platform 6.5 with expanded abilities to work with huge data lakes and the ability to migrate any analytical workload to public cloud systems.

Based in San Mateo, Calif., AtScale was founded in 2013.

Bedrock Data

Top Executive: CEO Taylor Barstow

In March Bedrock Data launched its Fusion software that unifies customer data from multiple Software-as-a-Service applications into a SQL data warehouse for analysis and provide a "360-degree view" of a customer.

Bedrock Data's earlier product, Sync, synchronizes data between multiple applications, standardizing and de-duplicating customer data across multiple CRM, marketing and ecommerce systems.

Based in Boston, Bedrock Data was founded in 2012.

BlueData Software

Top Executive: CEO Kumar Sreekanti

BlueData's software, which incorporates Docker's container technology, is used to deploy big data workloads on-premise, in a public cloud or in a hybrid model. BlueData EPIC (Elastic Private Instant Clusters) is a Big-Data-as-a-Service platform for on-demand provisioning of Hadoop, Spark, Kafka, and other big data analytics and data science tools.

The company's latest release of BlueData EPIC added support for GPU-based systems and machine learning applications such as TensorFlow.

BlueData said sales grew by 228 percent in 2017 with the addition of new customers such as Citigroup, GM Financial, GlaxoSmithKline, Seattle Children’s Hospital and Tesco Bank.

Based in Santa Clara, Calif., Blue Data was founded in 2012.


Top Executive: CEO Prat Moghe

Cazena offers a fully-managed Big Data-as-a-Service platform for storing, sharing and analyzing data without the need for DevOps and other skills. With such partners as Cloudera, Microsoft Azure and AWS, Cazena provides big data solutions around business intelligence and analytics, customer insights, data science and data engineering.

In November Cazena launched an AppCloud service where customers can deploy and run machine learning and analytical solutions including DataRobot, Cloudera, Data Science Workbench, StreamSets and Arcadia Data.

Based in Waltham, Mass., Cazena was founded in 2014.


Top Executive: CEO Jay Kreps

Confluent is helping businesses tackle the problem of working with streaming data by developing commercial software and services around Apache Kafka, the open-source platform for processing and analyzing streams of data in real time.

Confluent's software helps businesses maximize the value of live data in such industries as retail, financial services, manufacturing, media and logistics management.

Last year the company debuted KSQL, an open-source streaming SQL engine that enables continuous, interactive queries on Kafka, allowing developers familiar with SQL to build applications that work with Kafka.

Based in Palo Alto, Calif., Confluent was founded in 2014.


Top Executive: CEO Dan Schoenbaum

Cooladata's behavioral analytics software collects, unifies and analyzes data from web sites, CRM systems, mobile applications, marketing campaigns and in-house databases to help businesses understand their customers, improve their products and services, and increase sales.

Based in Tel Aviv, Israel, Cooladata was founded in 2012.


Top Executive: CEO Ali Ghodsi

Databricks markets its Spark-based Unified Analytics Platform that’s used by data scientists to tackle large-scale data analytics and artificial intelligence problems. The company was founded by the University of California at Berkeley team that created Apache Spark, the open-source processing engine that's become a core component of many big data initiatives.

In March Microsoft went live with Azure Databricks, a cloud service built in collaboration with Databricks targeting use for large-scale business analytics and artificial intelligence projects.

Based in San Francisco, Databricks was founded in 2013.


Top Executive: CEO Florian Douetteau

Dataiku offers a collaborative platform for advanced analytics, allowing organizations to apply data science and machine learning techniques to build and deploy their own big data software.

Dataiku raised $28 million in Series B funding in January. In February the company said it tripled its revenue in 2017 and doubled the size of its staff.

Based in New York, Dataiku was founded in 2013.


Top Executive: CEO Jeremy Achin

DataRobot develops an automated machine learning platform that captures the knowledge, experience and best practices of data scientists and uses that information to build and deploy predictive models much more quickly than previously possible. With those models, analysts can uncover hidden opportunities and predict outcomes from huge volumes of data.

DataRobot snagged $54 million in Series C financing in March, 2017, bringing its total funding to more than $124 million, and the company has aggressively invested in its global partner ecosystem.

Based in Boston, DataRobot was founded in 2012.


Top Executive: CEO Ian Swanson

The DataScience.com Platform provides capabilities that cater to data scientists, business users and IT teams. The system makes it possible for data science teams to collaborate on data-driven projects by helping them explore and visualize data, share analyses, deploy models into production and track their performance.

In March the company partnered with GitHub in a bid to promote version control best practices for enterprise data science teams looking to scale up their data science projects.

Based in Culver City, Calif., DataScience.com was founded in 2014.

Domino Data Lab

Top Executive: CEO Nick Elprin

Domino develops a data science platform for use by data scientists and data science teams, business executives and IT managers. The company's system for developing and deploying predictive models includes a workbench, a collaboration hub, and tools for publishing and deployment.

In January San Francisco-based Domino Data Lab reported triple-digit revenue growth in 2017.

Based in San Francisco, Domino Data Lab was founded in 2013.


Top Executive: CEO Brian Platz

Blockchain technology, originally developed for managing digital currencies like Bitcoin, is essentially a digital ledger that allows information to be distributed but not copied. It has been getting a lot of attention this year as businesses discover blockchain's potential use for a range of applications that require ensuring the integrity and security of transactional data.

FlureeDB is a scalable blockchain cloud database that makes it easier for companies and developers who want to integrate blockchain technology into their existing IT infrastructure and business applications. It's seen as a key enabler as businesses increasingly run on decentralized applications.

Fluree PBC (public benefit corp.), the developer of FlureeDB, is the brainchild of Platinum Software founder CEO Flip Filipowski and SilkRoad Technology founder and CEO Brian Platz.

Based in Salem-Winston, N.C., Fluree launched in November 2017, as a public beta company.


Top Executive: CEO Asaf Somekh

Big data startup Iguazio officially launched its Continuous Data Platform last September. The unified data turnkey system ingests, enriches and analyzes data from a wide range of sources, simplifying the development and deployment of data-driven applications.

Earlier this year the company launched its inaugural channel program in a bid to globally recruit VAR, systems integrator and OEM partners to work with its unified data system.

Based in Herzliya, Israel, Iguazio was founded in 2014.

Imanis Data

Top Executive: CEO John Mracek

Imanis Data offers a hybrid cloud data management platform, which incorporates machine learning technology and is focused on data backup, archiving and recovery tasks. The San Jose company's technology works with data across on-premise and cloud systems including NoSQL databases, Hive and HBase.

In March the company named John Mracek, previously CEO at NetSeer, its new CEO and announced that it had raised $13.5 million in new financing.

Based in San Jose, Imanis Data was founded in 2013.


Top Executive: CEO Matthew Carroll

Under the slogan of "accelerating the algorithm-driven enterprise," startup Immuta develops a data management platform that makes data "discoverable" without the need to physically move it or copy it. That allows data scientists to quickly access data for machine learning tasks and developing analytical models, and data governance professionals for constructing and enforcing complex data policies.

In March Immuta launched its inaugural channel program, seeking reseller, professional service, IT infrastructure and technology partners to help the company expand beyond its early adopter customers.

Based in College Park, Md., Immuta was founded in 2014.


Top Executive: CEO Evan Kaplan

InfluxData has developed an extensive stack of open-source technologies that together address the challenging problem of managing the continuous flow of time-series data from Internet of Things networks and other systems.

The InfluxData platform offers a range of tools and services, including the InfluxDB time-series database, for real-time processing of time-series data in such areas as IoT, DevOps monitoring and real-time analytics.

Base in San Francisco, InFluxData was founded in 2012.


Top Executive: CEO Amar Arsikere

Managing the flow of data from operational systems and databases into big data management and analysis systems in a useful form is a challenge that often sinks big data projects.

Infoworks develops software that automates the data engineering process for creating and managing ongoing big data workflows from data source to consumption. Specific tools automate data ingestion, data migration, data transformation and preparation, and data/metadata synchronization, among others.

In September Infoworks launched an end-to-end automated big data warehouse platform in the cloud.

Based in Palo Alto, Calif., Infoworks was found in 2014.


Top Executive: CEO Mark Kremer

JethroData develops a business intelligence SQL-on-Hadoop engine that accelerates interactive query performance for BI tools like Tableau and Qlik on big data.

In recent months JethroData has partnered with BI tool developers Information Builders and MicroStrategy to enable their software to work with JethroData.

Based in San Francisco, Jethro Data was founded in 2012.


Top Executive: CEO Osama Elkady

Incorta's mission is to replace traditional data warehouse systems and ETL (extract, transform and load) tools with its data platform for real-time analytics and operational reporting.

Incorta's software uses what the company calls a "Direct Data Mapping" engine that executes complex data joins with real-time aggregations of huge volumes of data.

Based in San Mateo, Calif., Incorta was founded in 2013.

Kyvos Insights

Top Executive: CEO Praveen Kankariya

Kyvos Insight's big data OLAP (online analytics processing) platform is used by businesses and organizations to analyze massive volumes of data stored in big data systems such as Hadoop, whether they are on-premise or in the cloud.

Based in Los Gatos, Calif., Kyvos Insights was founded in 2012.


Top Executive: CEO Babur Ozden

Maana develops "knowledge-centric" data search and discovery software. The Maana Knowledge Platform, based on the company's patented Knowledge Graph technology and algorithms, collects data from multiple disparate systems and turns it into operational insights that can be used by line-of-business applications.

In February the company debuted Maana Q, which adds enhanced self-service capabilities to the Maana platform by allowing analysts and subject matter experts to develop a "knowledge layer" over operational and industrial data.

Based in Palo Alto, Calif., Maana was founded in 2012.

MapD Technologies

Top Executive: CEO Todd Mostak

MapD Technologies offers what it calls "the extreme analytics platform," a GPU database platform for interactive SQL and real-time visual analytics tasks involving massive sets of structured data.

In April MapD debuted MapD Cloud, a Software-as-a-Service edition of its GPU-accelerated analytics software.

Based in San Francisco, MapD Technologies was founded in 2013.


Top Executive: CEO Derek Smith

Businesses are heavily investing in big data initiatives for operational and analytical purposes. But those projects may be doomed to failure if they are working with poor-quality data.

Naveego's cloud-based software provides data quality and master data management tools that help organizations monitor and manage the quality of their business data – whether on-premise or in the cloud – and leverage it for competitive advantage.

Naveego launched its first channel program in October and is recruiting data management consultants, systems integrators and managed service providers.

Based in Traverse City, Mich., Naveego was founded in 2013.


Top Executive: CEO Prakash Nanduri

Paxata develops software that empowers business users to transform raw data into insightful information, instantly and automatically.

The vendor's Adaptive Information Platform is an enterprise-grade, self-service data preparation application and machine-learning system that, according to the company, weaves data into an information fabric from any source and any cloud to create trusted insights.

In the fall of 2017 the Redwood City, Calif.-based company launched its Intelligent Ingest software, an addition to the Adaptive Information Platform that simplifies and automates the process of collecting data from any cloud and in any format for business analysis.

In November Paxata got a boost from systems integrator giant Accenture, which designated Paxata a strategic partner and acquired a minority stake in the company. Accenture is also adding Paxata's software to its own Accenture Insights Platform.

Based in Redwood City, Calif., Paxata was founded in 2012.


Top Executive: CEO Ash Munshi

Pepperdata markets DevOps technology for the big data arena, helping developers optimize code for big data Hadoop and Spark applications and clusters for maximum performance.

In March Pepperdata, based in Cupertino, Calif., launched Application Spotlight, a self-service portal that big data application developers use to generate application-specific recommendations to improve application performance, identify applications that need attention, flag performance bottlenecks, and issue alerts on failure conditions and resource usage.

In September the company launched a strategic partner program to provide support, training and resources to systems integration and service provider partners.

Based in Cupertino, Calif., Pepperdata was founded in 2012.

Podium Data

Top Executive: CEO Paul Barth

Businesses have been assembling data lakes, huge stores of generally raw, unorganized data – often built on Hadoop. The challenge is finding a way to tap into all that data's potential value.

Podium develops the Podium Data Marketplace, an enterprise data management platform for building centralized repositories of clean, well-documented data that's accessible to a broad range of users.

In 2017 Podium expanded its product lineup with Data Conductor, a toolset that helps data managers, compliance professionals and business users manage, discover and access all data on any platform within an enterprise. The company also added "Intelligent Data Identification" to its platform, a tool that combines a smart data catalog with a pattern recognition engine to identify duplicate data, improve data governance and reveal potential data corruption problems.

Based in Lowell, Mass., Podium was founded in 2014.


Top Executive: Founder Alex Johnson

Plotly has developed a number of open-source tools for composing, editing and sharing interactive data analysis and visualization charts, graphs and dashboards via the Internet.

The company, also known by its URL "Plot.ly," is catching on among data scientists and developers who develop analytical applications using the Python and R programming languages.

Based in Montreal, Plotly was founded in 2012.

Snowflake Computing

Top Executive: CEO Bob Muglia

Snowflake Computing developed a cloud-based enterprise SQL data warehouse system, based on a patented architecture, that the company said eliminates the complex administration and management tasks associated with traditional data warehouse systems.

Running on AWS, Snowflake data warehouses can serve up structured and semi-structured data for analytical and reporting services.

In January the company closed on $263 million in Series E growth funding from venture capital investors, bringing its total funding to $473 million and putting the company's valuation at $1.5 billion

Base in San Mateo, Calif., Snowflake Computing was found in 2012.

Splice Machine

Top Executive: CEO Monte Zweben

Splice Machine's core product is an open-source SQL relational Database-as-a-Service, powered by Apache Hadoop and Apache Spark, with data warehouse and machine learning capabilities.

In December Splice Machine unveiled its new Online Predictive Processing Platform for running predictive analytics for real-time operational applications. At the same time the company raised an additional $9 million in financing, bringing its funding total to $40 million.

Based in San Francisco, Splice Machine was founded in 2012.


Top Executive: CEO Girish Pancha

StreamSets offers a data operations platform and related products for managing the life-cycle of "data in motion" or, as the company puts it, "air traffic control for your data."

The StreamSets Data Operations Platform is the core of the company's product line for building, executing, operating and protecting dataflows. Additional software includes Data Collector and Data Collector Edge, Control Hub for managing dataflow architectures, and Dataflow Performance Manager for operating dataflow pipelines.

Based in San Francisco, StreamSets was founded in 2014.


Top Executive: President and CEO Ali Kutay

The Striim platform, an end-to-end streaming data integration and intelligence system, makes it possible to integrate, analyze and visualize streaming data from big data networks, cloud systems and Internet of Things devices.

The latest release of the Stream software, out in April, bolstered the product's streaming data integration and hybrid cloud capabilities, including real-time data ingestion and stream processing for Apache Kudu, a column-oriented data store within Hadoop.

Based in Palo Alto, Calif., Striim was founded in 2012.


Top Executive: CEO Andy Palmer

Tamr's Enterprise Data Unification data source connectivity software uses machine learning technology to automate the process of curating, unifying and enriching data across multiple data sources for business analytics tasks.

Based in Boston, Tamr was founded in 2013 by Andy Palmer, Vertica's founding CEO, and database technology notable Michael Stonebraker.


Top Executive: CEO Ajeet Singh

ThoughtSpot markets the ThoughtSpot business analytics platform that uses search and artificial intelligence technology to generate analysis and business insights from huge volumes of data. In October the company debuted SpotIQ, a new AI-driven analytics engine offered as part of the ThoughtSpot 4.4 system.

The industry took notice in August when the company raised $60 million in Series C in venture funding, bringing its total financing to more than $160 million. In March the company said revenue in its fiscal 2018 fourth quarter (ended Jan. 31) grew 180 percent year-over-year.

Based in Palo Alto, Calif., Thoughtspot was founded in 2012.


Top Executive: CEO Adam Wilson

Trifacta develops "data wrangling" software used to discover and prepare raw data for business analytics tasks. Trifacta Wrangler Enterprise gives data analyst teams the self-service capability to explore and transform data while centralizing data security and governance.

In January Trifacta raised $48 million in additional financing, bringing its total funding to $124 million.

Based in San Francisco, Trifacta was founded in 2012.

Waterline Data

Top Executive: CEO Alex Gorelik

Waterline's Smart Data Catalog uses machine learning to discover, manage and govern enterprise data at scale. The software is used by chief data officers, analysts and data stewards for self-service data analytics, data governance and data rationalization/optimization tasks.

In March the company unveiled the Waterline Metadata Discovery Platform, which uses data virtualization technology to accelerate big data discovery and governance, and the use of big data in such applications as compliance and data cataloging for analytics .

Based in Mountain View, Calif., Waterline Data was founded in 2013.


Top Executive: Nick Halsey

Zoomdata pitches its visual analytics and data visualization software, based on its patented "Data Sharpening" technology, as the fastest in the industry. The analysis software works huge volumes of structured, unstructured and even streaming data.

In January Zoomdata added "Smart Streaming" capabilities to its platform, which enables easier connections to streaming data sources and the blending of streaming data with historical data.

Based in San Mateo, Calif., Zoomdata was founded in 2012.