The Coolest Emerging Vendors Of The 2020 Big Data 100

Part 6 of CRN’s Big Data 100 looks at the emerging vendors solution providers need to know.

Up And Comers

Many of the IT vendors on this year’s Big Data 100 list are established companies, providing the data analytics tools, database software and IT infrastructure that make up the core of the big data systems within many businesses and organizations. But nearly one in four of this year’s Big Data 100 companies is a startup, offering leading-edge technology to help their customers handle their big data challenges.

As part of the 2020 Big Data 100, we’ve put together a list of emerging vendors – companies founded in 2014 or more recently – that solution providers should be aware of.

This week CRN is running the Big Data 100 list in slide shows with vendors of business analytics software, big data systems and platforms, database systems, data management and integration tools, and data science and machine learning tools. (Some vendors offer big data products that span multiple technology categories: They appear in the slideshow for the technology segment in which they are most prominent.)


Top Executive: CEO Haoyuan Li

Founded: 2015

Alluxio develops a virtual distributed file system that began as the Tachyon research project at the University of California, Berkeley’s AMPLab. The technology provides hybrid cloud data orchestration – bringing data closer to compute systems – to improve data accessibility and speed up analytical and machine learning applications.

In March the San Mateo, Calif-based company unveiled the Alluxio Structured Data Service with data catalog and data transformation capabilities that provides just-in-time data transformation for compute-intensive applications.


Top Executive: CEO David Drai

Founded: 2014

Anodot provides analytics software that uses artificial intelligence and machine learning to autonomously track and monitor business data to discover changes and anomalies that can impact business performance.

Just this month Anodot, with its U.S. headquarters in Redwood City, Calif., raised $35 million in a series C round of financing led by Intel Capital, bringing the company’s total financing to $62 million.


Top Executive: CEO Prat Moghe

Founded: 2014

Data lake systems are huge repositories of raw structured and unstructured data used for a broad range of applications. But they can be complex to build and manage.

Cazena, based in Waltham, Mass., offers a managed Data Lake-as-a-Service for building cloud-based data lakes on AWS or Microsoft Azure. The service includes cloud storage and infrastructure, data and analytics workload engines, workload SLA management and optimization, data ingestion and integration, security and encryption, governance and compliance, production operations, and support for analytics, data science and machine learning.


Top Executive: CEO Jay Kreps

Founded: 2014

Confluent’s flagship product, the Confluent Platform, provides the capability to organize and manage the massive volumes of streaming data being generated by businesses today and make the data available to business applications and information workers. The company’s event stream processing software is based on Apache Kafka, an open-source stream processing system that was developed by Confluent’s founders when they worked at LinkedIn.

Mountain View, Calif.-based Confluent raised a stunning $250 million in Series E funding earlier this month, pushing the company’s market valuation to $4.5 billion.


Top Executive: CEO Ryohei Fujimaki

Founded: 2018

DotData touts its DotData Enterprise machine learning and data science platform as capable of reducing AI and business intelligence projects from months to days. The platform is based on the company’s AutoML 2.0 engine that provides full-cycle automation of data science and machine learning tasks.

Startup DotData, launched in 2018 and based in San Mateo, Calif., raised $23 million in Series A funding in October 2019.


Top Executive: CEO Billy Bosworth

Founded: 2015

Dremio’s data lake engine provides a self-service semantic layer that analysts and data scientists use to explore data and create virtual datasets out of the huge volumes of data often stored in data lake systems. It also provides a way to directly query data in data lakes running on Hadoop, AWS S3 and Azure Data Lake Store.

Dremio, headquartered in Santa Clara, Calif., raised $70 million in Series C funding in March.


Top Executive: CEO Adam Famularo

Founded: 2016

Erwin’s flagship product, Erwin Data Modeler, is used to find, visualize, design, deploy and standardize an organization’s data assets. That makes it possible to discover and document data for large-scale data integration, data governance, master data management, business analytics and other big data initiatives.

Erwin, which has been an independent company since it was spun out of CA Technology in 2016, also provides data catalog and data literacy software. The company is based in Melville, N.Y.


Top Executives: Co-CEOs Brian Platz and Flip Filipowski

Founded: 2016

Fluree’s platform organizes block-chain secured data in a highly scalable graph database. The company’s software is targeted toward “Web3” applications in supply chain management, MRO (maintenance, repair and operations), insurance, and credentials and identity.

Fluree just this month launched the Fluree Partner Network partner program for VARs, systems integrators and ISVs that partner with the Winston-Salem, N.C.-based startup.


Top Executive: CEO Asaf Somekh

Founded: 2014

The Iguazio Data Science Platform automates and accelerates machine learning workflow pipelines, allowing businesses to develop, deploy and manage AI applications at scale that improve business outcomes – what the company calls “MLOps.”

In January New York-based Iguazio raised $24 million in financing.


Top Executive: CEO Matthew Carroll

Founded: 2014

Immuta is focused on the legal and ethical use of data. The company’s Automated Data Governance software provides businesses and organizations with a way to automate their data governance, audit and compliance efforts, providing self-service data access with automated privacy controls.

In late 2019 Boston-based Immuta added new sensitive data detection and additional privacy-enhancement features to the Automated Data Governance platform.


Top Executive: CEO Buno Pati

Founded: 2014

Infoworks touts its DataFoundry enterprise data operations and orchestration system as a critical technology for enterprise digital transformation efforts. DataFoundry includes tools for data ingestion and preparation, data operations management and governance, data warehouse migration and data lake management, and data modeling and OLAP cube creation.

In February Infoworks, based in Palo Alto, Calif., made DataFoundry 3.0 generally available with native support for the Databricks Unified Analytics Platform, as well as new data onboarding, preparation and operations capabilities.


Top Executive: CEO Paul Appleby

Founded: 2016

Kinetica has developed high-performance analytics software, the Kinetica Active Analytics Platform, that combines streaming and historical data with location intelligence and machine learning-based analytics to tackle complex problems. At the system’s core is the startup’s distributed, in-memory GPU database that can analyze massive datasets with millisecond response times.

In February the San Francisco-based company launched Kinetica Cloud, a cloud-based version of the Kinetica platform, running on Microsoft Azure and Oracle Cloud. Kinetica Cloud will be available on Amazon Web Services and Google Cloud later this year.

Magnitude Software

Top Executive: CEO Chris Ney

Founded: 2014

Magnitude Software’s product portfolio provides unified application data management capabilities including analytics and reporting, master data management, product information management and data connectivity. Many of the Austin-based company’s software products are focused on managing and analyzing data generated by SAP and Oracle applications.


Top Executive: CEO Michael Howard

Founded: 2014

MariaDB is a community-developed, commercial fork of the popular MySQL relational database. MariaDB was founded in 2010 by one of MySQL’s original developers because of concern about MySQL’s future as an open-source database after the database technology was acquired by Oracle.

On March 31 the MariaDB corporation, based in Redwood City, Calif. and Helsinki, Finland, debuted MariaDB SkySQL, a cloud-native Database-as-a-Service for both transaction processing and business analytics applications. The company also announced that SkySQL was available on the Google Cloud Platform.


Top Executive: CEO Katie Horvath

Founded: 2014

Naveego offers the Complete Data Accuracy Platform for managing data quality. The software helps data managers and analysts discover what data an organization has, collate and synchronize data across multiple sources, maintain data accuracy and create a single record of data that can be enforced across an organization.

Naveego is based in Traverse City, Mich.


Top Executive: Amnon Drori

Founded: 2015

Octopai develops an automated, centralized, cross-platform metadata search engine that business intelligence groups use to discover, govern and track shared metadata. The software is used to maintain company-wide data consistency and help business analysts find and understand available data.

Octopai is based in Rosh Ha’ayin, Israel.


Top Executive: CEO Laniv Leven

Founded: 2015

Panoply’s data management platform makes it possible to synchronize and store data from more than 100 sources for data analysis tasks. The system combines cloud data warehouse infrastructure, ETL capabilities, automated data integration and AI-driven automation.

Panoply is based in San Francisco.


Top Executive: CEO Kaycee Lai

Founded: 2018

Promethium is addressing the challenges of self-service data discovery and analytics with its Data Navigation System, the startup’s augmented data management software that allows information workers to access an organization’s entire “data estate” for answers to questions developed in plainspoken language, eliminating the dependence on manual SQL scripting to develop queries.

Based in Menlo Park, Calif., Promethium raised $6 million in venture funding in January.


Top Executive: CEO Venkat Venkataramani

Founded: 2016

Rockset offers a serverless, real-time search and analytics database system for developing and running applications that make decisions using real-time data – an alternative to traditional batch-oriented analytical database technologies. The database stores and indexes real-time data from transactional systems and event streams using schema-free JSON documents and declarative SQL over REST interfaces.

In March the San Mateo, Calif. company released its new Query Lambdas capability that runs developer queries in response to events, enabling developers to build data applications faster.


Top Executive: CEO Girish Pancha

Founded: 2014

StreamSets develops its DataOps Platform for data ingestion, integration and ETL tasks. The San Francisco-based company brings DevOps practices and technology to data integration to avoid what it calls “data drift” – the constant and unexpected changes within data that disrupt dataflows.

The StreamSets DataOps Platform includes Control Hub, Data Collector and Transformer. StreamSets on Cloud builds data pipelines into any cloud system from any cloud system.


Top Executive: CEO Ajay Khanna

Founded: 2015

Tellius offers a search-driven analytics platform, the Tellius Genius AI Engine, that the company says makes it easy for users to ask questions of, and get deep insights from, their business data. The system’s voice, search and natural language capabilities augment self-service BI and analytics initiatives.

Tellius is based in Reston, Va.


Top Executive: CEO Ajay Kulkarni

Founded: 2015

Timescale develops the TimescaleDB open-source SQL database, built on the PostgreSQL database, for storing and analyzing time-series and IoT data. In June 2019 the New York-based company debuted Timescale Cloud, a cloud-based, fully managed cloud database service running on AWS, Microsoft Azure and the Google Cloud Platform.

Timescale just released TimescaleDB 1.7 with PostgreSQL 12 support and real-time aggregates functionality.

Yellowbrick Data

Top Executive: CEO Neil Carson

Founded: 2014

Yellowbrick Data provides massively parallel data warehouse technology for hybrid cloud environments. The company was founded in 2014 by experts in flash memory and next-generation database technologies with the goal of simplifying data warehousing projects.

The Yellowbrick Data Warehouse is available as an on-premises appliance or a cloud service.

In February the company struck an alliance with MicroStrategy to integrate that company’s data analytics software with the Yellowbrick Data Warehouse.