The 10 Coolest Big Data Startups Of 2016

Big Data, Big Changes

The big data technology market is in the final stages of what research firm Gartner identifies as a multi-year shift from IT-led, system-of-record reporting systems to business-led, self-service analytics. The result is that new business intelligence and analytics platforms have emerged to meet the new organizational requirements for accessibility, agility and deeper analytical insight.

Providing those next-generation systems is a steady stream of startups developing new technology for collecting, managing and analyzing big data, whether its structured or unstructured, in motion or at rest, on-premise or in the cloud.

Here are 10 cool startups in the big data arena that caught our attention in 2016.

(For more of our 2016 retrospective, check out 'CRN's 2016 Tech Year In Review.')


CEO: David Drai

Anodot exited stealth in November 2015 by unveiling its real-time anomaly detection and operational intelligence technology – software with patented machine-learning algorithms that automate business analytics and pinpoint performance issues and business opportunities. The goal is to find outliers within huge amounts of data and turn them into valuable business insight.

Anodot's technology is targeted toward use with e-commerce websites, digital advertising systems and Internet of Things networks for improving operational efficiency and maximizing revenue generation.

The company, based in Ra'Anana, Israel, and Sunnyvale, Calif., raised $8 million in Series B venture funding in September.


CEO: Jay Kreps

Working with live, streaming data is one of the big challenges in big data management and analytics. One technology that's addressing the problem is the open-source Apache Kafka message broker project that provides high-throughput, low-latency software for handling real-time data feeds.

Confluent, founded by Apache Kafka's original developers, has created a complete real-time data platform around Kafka that acts as a fault-tolerant, highly scalable messaging system. The software can be used for collecting data from user activity logs, stock ticker systems, device instrumentation and a broad range of other use cases.

In May the Palo Alto, Calif.-based company announced the general availability of Confluent Platform 3.0, incorporating Kafka Streams for adding stream processing capabilities to applications, and the Confluent Control Center for operationalizing Kafka across an organization.


CEO: Jon Matsuo

Koverse has developed a "data lake-in-a-box" platform that the company says makes it possible to collect big data and put it into production much more quickly and at lower cost than with current technologies and practices.

The Seattle-based company was started in 2012 and an early 1.0 version of its technology debuted more than two years ago. The Koverse Platform 2.0, which launched in June, incorporates the Apache Accumulo "distributed key/value store" technology and the company's own Universal Indexing Engine.

Co-founders Paul Brown (chief product officer) and Aaron Cordova (chief technology officer) worked as data scientists at the National Security Agency where they helped develop the original Accumulo project and re-architected that organization's data infrastructure to better handle unanticipated data analytical situations.


CEO: Babur Ozden

Maana develops the Maana Knowledge Platform, data search and discovery software whose forte is collecting data from numerous systems or "silos" and turning it into operational insight that can be used by line-of-business applications. The system is built on the Apache Spark processing engine.

The Palo Alto, Calif.-based company, founded in 2012 and officially launched in May 2015, is well-positioned for collecting and analyzing large volumes of data generated by Internet of Things networks. In September the company debuted the Winter '17 edition of its product with "Knowledge Applications" for optimizing business processes like supply chain and call center management, and "Knowledge Assistants" for creating new analytical models.

In May 2016 the company raised $26 million in Series B financing, much of it from the company's oil and gas, and industrial customers including Shell, Chevron, Saudi Aramco, Intel and General Electric.


CEO: Joe Doliner

Eschewing much of the current generation of big data technology, startup Pachyderm has developed an open-source analytics engine that uses Docker containers for performing distributed computations.

The point is to offer a data analytics infrastructure that is containerized, modularized and scalable, using such tools as Docker and Kubernetes as building blocks. The company's Pachyderm File System and Pachyderm Pipeline System software helps data managers and analysts build machine-learning pipelines and data ETL (extract, transform and load) workflows.

Founded in 2014, San Francisco-based Pachyderm raised $2 million in seed funding in June.


CEO: Girish Pancha

StreamSets is another company that is tackling the challenges of managing data in motion. More specifically, the startup develops software to guard against the insidious problem of "data drift" – the unpredictable mutations that data can undergo at the source and create problems when applications use that data.

The company's StreamSets Data Collector software is used to build complex data flows between any data source and any application where it's needed. In September the company debuted StreamSets Dataflow Performance Manager for managing dataflow operations.

Based in San Francisco, StreamSets was founded in 2014 by CEO Girish Pancha, former chief product officer at Informatica, and chief technology officer Arvind Prabhakar, an early employee and engineering leader at Cloudera.


President and CEO: Ali Kutay

Striim, pronounced "stream" with the "i's" standing for integration and intelligence, was founded in 2012 by former executives from Golden Gate Software, Oracle, Informatica, WebLogic and other big-name data management companies.

The Palo Alto, Calif.-based company's software combines streaming data integration and streaming operational intelligence in one system, enabling continuous query/processing and streaming analytics. In November the company released a new edition of its software that works with Google Big Query, Kafka and MapR Technologies.

Striim raised $10 million in additional financing in March, bringing its total Series B funding to $30 million.


CEO: Mark Cunningham

Stytch debuted its end-to-end data analytics platform in April, offering in one system tools for self-service data preparation, data modeling, data discovery, reporting and dashboards. Stytch is backed by Dun & Bradstreet and a key selling point of the Stytch system is its links to Dun & Bradstreet's vast database of business data.

The Vancouver-based company launched in August 2015. Founder and CEO Mark Cunningham has been involved in the business intelligence industry since 1992 when his family company began developing Crystal Reports, an early and very successful Windows-based reporting tool.


CEO: Nitin Donde

Talena provides "always-on" big data management software to help companies protect valuable data assets and iterate rapidly on their business-critical applications. The company's technology provides backup and recovery, test and development management, and archiving capabilities across Hadoop, NoSQL data stores like Cassandra and Couchbase, and modern data warehouses like Hewlett Packard Enterprise Vertica.

In March Talena launched the ActiveRx predictive analytics infrastructure for big data management tasks. The software addresses the question of how to incorporate machine learning into predicting data availability and how to turn backed-up data into active data assets.

Waterline Data

CEO: Alex Gorelik

As organizations assemble Hadoop-based data lakes for storing huge amounts of data, finding the best way to make use of all that information becomes a significant challenge – not to mention the data governance headaches it creates.

Waterline Data addresses that problem with its Smart Data Catalog software that builds a complete inventory of data lake assets, improving data discovery and data governance and making it easier for businesses to derive value from those assets.

Founded in 2013, Mountain View, Calif.-based Waterline Data raised $16 million in Series B funding in January.