2014 Big Data 100: The Emerging Big Data Vendors

Big Data Emerging Vendors

Businesses continue to struggle with exploding volumes of data, not just manage it to keep IT systems from being overwhelmed, but to find ways of deriving value from all that data and somehow use it for competitive advantage.

The second annual Big Data 100 list identifies vendors that have demonstrated an ability to innovate in bringing to market products and services that help business work with big data. Here are 53 emerging vendors, launched since 2008, who are pushing the technology envelope and challenging the established vendors.


Top Executive: CEO Ash Ashutosh

Fast-growing Actifio has developed a copy data management platform that eliminates the problem of "data sprawl" across a company by creating a single copy of an organization's production data and making it virtually available for backup, disaster recovery, software development and testing, business analytics and archiving purposes.

In March Actifio, founded in 2009, received $100 million in financing, giving the Waltham, Mass.-based company a valuation of about $1 billion. About 85 percent of the company's sales are through channel partners.


Top Executive: CEO Joe Gottlieb

Aerospike develops a real-time, flash-optimized NoSQL database for running high-performance applications. The in-memory database meets ACID (atomicity, consistency, isolation and durability) requirements for reliable transaction processing.

The Mountain View, Calif.-based company, founded in 2009, said earlier this month that its database achieved performance of 1 million transactions per second on a single server with 50 million records, based on the Yahoo Cloud Serving Benchmark.

Alpine Data Labs

Top Executive: President, CEO Joe Otto

Alpine Data Labs offers an advanced, Hadoop-based data analytics platform. The company's forte is making it possible for people without coding skills or deep analytical expertise to reap insights from large data sets using a drag-and-drop approach to creating analytical queries.

Founded in 2010, San Francisco-based Alpine Data Labs raised $16 million in Series B venture funding in November, bringing its total financing to $23.5 million.


Top Executive: CEO Dean Stoecker

Alteryx's software is used to blend structured and unstructured data from a range of sources into one database, conduct predictive, spatial and statistical analysis tasks, and then share the results. The Irvine, Calif.-based company was founded in 2010.

Alteryx 9.0, released in April, can tap into social media data feeds from DataSift, sales and marketing data from Google Analytics and Marketo, and customer data-centric data warehouses such as Amazon Redshift, Pivotal, Greenplum and HP Vertica. Alteryx also has a strategic alliance with business intelligence software vendor QlikTech.


Top Executive: CEO Damon Danieli

Appuri operates a cloud-based customer data system that captures customers- touchpoint data from internal and external sources and creates a petabyte-scale data warehouse within a dedicated virtual private cloud.

Founded in 2012, Appuri is based in Redmond, Wash.


Top Executive: Co-Founder, CEO Gurjeet Singh

Ayasdi's Insight Discovery Platform, which utilizes "topological data analysis" technology combined with machine learning techniques, provides insights derived from data that help organizations solve complex problems without writing code or queries.

Ayasdi, based in Palo Alto, Calif., was founded in 2008 to leverage research from Stanford University, DARPA and the National Science Foundation. In addition to its core platform, Ayasdi offers the Ayasdi Cure system for analyzing and visualizing complex clinical and genomic data for drug development.


Top Executive: Founder, CEO Dave Fowler

Chartio develops cloud-based data visualization software that businesses use to combine data sets and create charts and dashboards for analysis -- all without the need to develop an on-premise data warehouse.

In January the San Francisco-based company raised $2.2 million in financing, in addition to the $4.4 million it raised in 2011. The company, founded in 2010, also enhanced its software with the ability to execute custom formulas for analysis and the capability to combine data from multiple sources without the need for data ETL tools.


Top Executive: CEO Mark Theissen

Cirro develops a next-generation data federation platform that makes it possible for nontechnical users to query and explore structured and unstructured data from multiple sources and perform complex analytical tasks. The Aliso Viejo, Calif.-based company was founded in 2010.

The company's products include the Cirro Data Hub, which determines and orchestrates where a query will be processed and issues appropriate data requests to relevant sources; Cirro Analyst for Excel, which allows Excel to be used as a front-end analysis tool; and Cirro Multi Store for data staging or creating workspaces and datamarts.

Citus Data

Top Executive: CEO Umur Cubukcu

Citus Data developed CitusDB, a distributed analytics database that can run SQL queries and, according to the company, process petabytes of data in seconds. CitusDB is based on Google Dremel, a real-time analytics database developed by the giant search company.

Citus Data released CitusDB 3.0 in February with a number of new capabilities including dynamic repartitioning of tables to handle joins between any number of tables, independent of their size and partitioning method. The San Francisco-based company was founded in 2010.

ClearStory Data

Top Executive: Founder, CEO Sharmila Mulligan

ClearStory launched its big data analysis and exploration platform and applications late last year. The company's Data Intelligence software is designed to make it easier to access internal and external data sources, including corporate databases, Hadoop and the Internet, and use that data to uncover trends and patterns.

ClearStory Data, Palo Alto, Calif., was founded in 2011. It raised $21 million in Series B financing in March, bringing its total funding to $31.5 million.


Top Executive: CEO Tom Reilly

Cloudera Enterprise is the vendor's distribution of the Hadoop platform, coupled with system management (Cloudera Manager) and data management (Cloudera Navigator) tools. Launched in 2008, Cloudera is based in Palo Alto, Calif.

On March 31 Cloudera closed on a whopping $900 million financing round, followed a few days later with the general release of Cloudera Enterprise 5. And later in April the company added resources, training and certification to its Cloudera Connect Partner Program.


Top Executive: CEO Gary Nakamura

Concurrent, founded in 2008, offers application middleware technology that businesses use to develop, deploy, run and manage big data applications. The company's products include the Cascading application development framework and Driven application performance management software.

Last month the San Francisco-based company debuted Cascading 3.0 with support for local in-memory, Apache MapReduce and Apache Tez. Concurrent closed on $4 million in Series A funding last year.


Top Executive: Co-Founder, CEO Jonathan Gray

One problem with Hadoop is the shortage of skilled developers capable of building applications that leverage its capabilities. Continuuity offers the Continuuity Reactor development engine that Java programmers use to build and deploy cloud-based big data applications.

Continuuity, founded in late 2011 and based in Palo Alto, Calif., shipped Continuuity Reactor 2.0 in October with MapReduce scheduling, resource isolation, high availability and full support for REST APIs.

Continuum Analytics

Top Executive: Co-founder, CEO Travis Oliphant

Continuum Analytics, founded in 2011, develops data analytics software based on the Python programming language. In February the Austin, Texas-based company released Anaconda 1.9, the latest version of its collection of libraries for big data management analysis and cross-platform visualization for business intelligence, scientific, engineering and machine learning tasks.


Top Executive: President, CEO Bob Wiederhold

Another entry in the "alternative database" competition, Couchbase develops and supports Couchbase Server, a commercial version of Apache CouchDB, the open-source, document-oriented NoSQL database. Backers pitch CouchDB as superior to traditional relational databases for managing unstructured data and cloud computing.

Couchbase, founded in 2011 and based in Mountain View, Calif., released Couchbase Server 2.5 in February with "rack awareness" for improved availability and reliability, and advanced data encryption added to the system's cross-data center replication.


Top Executive: Co-founder, CEO Paula Long

This intriguing startup remains in stealth mode, but it's attracting lots of attention. The company was co-founded by Paula Long and John Joseph, key executives behind storage technology developer EqualLogic, which Dell acquired in 2008 for $1.4 billion. The company just hired Steve Noyes, former Oracle engineering vice president, for the same post at DataGravity.

DataGravity's website says the Nashua, N.H.-based company's mission is "turning data into information" and "make storage an active asset for SMBs." We also know DataGravity will be 100 percent channel-driven and is recruiting early access channel partners. Look for a product launch later this year.


Top Executive: Co-Founder, CEO Chris Neumann

Under the motto of "Analytics Simplified," this San Francisco-based company develops software that analyzes data and automatically creates visualizations -- charts and graphs -- from the information without the need for the user to tackle complex coding.

The company, founded in 2011, launched its service one year ago. In December it snagged $3.15 million in extended seed funding, money the company is using to expand its sales and engineering staffs and take the company to the next level.


Top Executive: CEO Stefan Groschupf

Founded in 2009 by some of the original contributors to Apache Hadoop, Datameer develops software that helps business users of Hadoop integrate, analyze and visualize large volumes of data.

Datameer secured $19 million in Series D financing in December. Last month the San Mateo, Calif.-based company launched Datameer 4.0 with a "flip side" feature to its spreadsheet interface that helps users understand information about the data such as a column's distribution, maximum, minimum, mean and record count.


Top Executive: Co-Founder, CEO Sundeep Sanghavi

DataRPM developed "cognitive data discovery" technology that lets users analyze and visualize data residing in corporate databases, Hadoop or other sources using a natural language query and search interface. The company's software is available through the cloud or for on-premise.

Based in Fairfax, Va., and founded in 2012, DataRPM raised $5.1 million in its first round of funding led by InterWest partners.


Top Executive: CEO Rob Bailey

DataSift develops a social data platform that businesses use to monitor social media such as Twitter, aggregate and filter data from public social conversations, and extract insights from that data. A company might use DataSift to gain insights about consumer opinion of its brand, for example, or glean competitive intelligence.

In December San Francisco-based DataSift (founded in 2010) raised $42 million in Series C financing, money the company is using to expand its sales and marketing efforts and develop its platform to include the ability to work with additional data types such as internal company data.


Top Executive: CEO Billy Bosworth

Santa Clara, Calif.-based DataStax developed a massively scalable data platform based on Apache Cassandra, the open-source distributed database for storing and managing huge amounts of data across multiple data centers and the cloud. The DataStax system also includes Apache Hadoop for analytics and Apache Solr for search. The company was founded in 2010.

In February DataStax launched its partner network program to recruit big data solution providers, consultants, technology service providers, professional services companies, training firms and application developers to work with the company.


Top Executive: Founder, CEO Josh James

Domo offers a cloud-based executive management platform the company said gives users access to information scattered across myriad sources through a single dashboard. The American Fork, Utah-based company was founded in 2011 by James, previously the co-founder and longtime CEO of Omniture.

In February the company raised $125 million in Series C financing, doubling its total venture funding. At the time the company said its annual growth was "far exceeding" 100 percent and that it had signed roughly 500 customers.


Top Executive: CEO Nick Mehta

Gainsight develops predictive analytics software that's integrated with Salesforce.com's CRM applications and helps users scrutinize customer data for customer retention and identify cross-sell and upsell opportunities.

In March the Mountain View, Calif. company, founded in 2009, launched the spring release of its platform, offering "success snapshots" to drive customer retention and subscription renewals, and a Salesforce1 mobile application.


Top Executive: President, CEO Larry Warnock

Gazzang, founded in 2010, is another startup that’s addressing the problem of securing big data. The company’s technology provides data security in big data and cloud computing environments, securing personally identifiable information, preventing unauthorized access to sensitive data and systems, and helping organizations comply with data security regulations.

Earlier this month Austin, Texas-based Gazzang unveiled a data encryption and key management system for OpenStack Swift, the open-source cloud technology stack’s object storage platform.


Top Executive: Founder, CEO Puneet Pandit

Glassbeam develops Software-as-a-Service applications for product analytics based on machine log data, putting it in a key position in business intelligence in the nascent-but-growing Internet of Things market.

In February Glassbeam, founded in 2009 and based in Sunnyvale, Calif., struck an OEM deal with Hitachi Data Systems under which Hitachi will build Glassbeam's log analysis technology into its Managed Private Cloud service.


Top Executive: CEO Rob Bearden

Hortonworks, launched in 2011, offers the Hortonworks Data Platform, a system based on Apache Hadoop combined with tools for data management, integration, security, provisioning and other software for enterprise data processing.

Earlier this month Palo Alto, Calif.-based Hortonworks acquired XA Secure, a startup developer of security and governance technology for the Hadoop big data platform. By adding XA Secure's technology to the Hortonworks Data Platform, the vendor strengthens its competitive hand against other Hadoop distributors.


Top Executive: Co-Founder, CEO Eli Singer

JethroData develops an index-based SQL engine for Hadoop that it says combines the scalability of HDFS (the Hadoop file system) with the power of a fully indexed columnar analytical database.

The company was found in 2012 and is based in Natanya, Israel.


Top Executive: Founder, CEO Steve McCanne

Jut is a San Francisco-based company founded in 2013 that remains in stealth mode as it develops software for capturing and analyzing big data.

The company raised $20 million in Series B financing in November from investors Wing VC, Accel Partners and Lightspeed Venture Partners.

MapR Technologies

Top Executive: Co-Founder, CEO John Schroeder

MapR Technologies competes with Cloudera, Hortonworks and other vendors in the Hadoop arena, building on its distribution of Hadoop and other open-source Apache software to create a complete big data platform for both operational and analytical purposes.

San Jose, Calif.-based MapR, founded in 2009, reported this month that first-quarter bookings tripled over last year's first quarter. The company now supports Hewlett-Packard's Vertica Analytics software running on the MapR platform.


Top Executive: Co-Founder, CEO Eric Frenkiel

MemSQL calls itself "the leader in real-time and historical big data analytics based on a distributed in-memory database." In January the company, founded in 2011, raised $35 million in Series B Funding.

In February San Francisco-based MemSQL unveiled version 3.0 of its in-memory database with combined in-memory row store and highly compressed column store. That integrated architecture, according to the company, allows the database to tap into both real-time and historical data for transaction processing and deep analysis.

Metric Insights

Top Executive: Founder, CEO Marius Moscovici

Metric Insights pitches its "push intelligence" technology as an antidote to business intelligence reports and dashboards that the company said makes users hunt for information. The Metrics Insight software delivers personalized business intelligence, key performance indicators and alerts.

San Francisco-based Metric Insights, founded in 2010, won first place at last October's O'Reilly + Hadoop World Startup Showcase 2013 for the investors' vote.


Top Executive: Co-Founder, CEO Gaurav Rewari

Startup Numerify just emerged from stealth mode in April to debut its cloud-based IT Enterprise Analytics Platform. The analytics application is built on ServiceNow's IT management software and collects and analyzes operational and financial data about an organization's IT systems that managers use to monitor system performance and make decisions about IT assets and capacity.

Cupertino, Calif.-based Numerify, founded in 2012, raised $8 million in Series A financing in October 2013.


Top Executive: CEO Marilyn Matz

Paradigm4 is another of the current crop of startups that's finding ways to apply leading-edge technology to the problem of analyzing massive volumes of data for complex problems in financial services, life sciences and other data-intensive industries.

Paradigm4, founded in 2010 and based in Waltham, Mass., develops the SciDB scalable array database with native complex analytics capabilities. Database luminary Michael Stonebraker is the company's CTO.


Top Executive: CEO Michael Hummel

ParStream develops a distributed, massively parallel processing columnar database that's designed to analyze and filter billions of records in sub-second time. The company, based in Cologne, Germany, has its U.S. headquarters in Cupertino, Calif.

ParStream, founded in 2008, recently struck an alliance with MicroStrategy under which that vendor's business analytics software could tap into the processing capabilities of the ParStream database, allowing users to interactively analyze extremely large data sets in real time.


Top Executive: Co-Founder, CEO Prakash Nanduri

Paxata is in the business of "adaptive data preparation," offering technology that simplifies the often-tedious work of transforming raw data into data that can be analyzed with business analytics tools such as QlikTech and Tableau (both Paxata partners). IntelliFusion, a proprietary semantic fusion and machine learning engine, is at the core of the company's cloud-based products.

In March In-Q-Tel, the CIA's strategic investment firm, invested an undisclosed amount in Paxata, The company was founded in 2012 and is based in Redwood City, Calif.


Top Executive: CEO Paul Maritz

Pivotal is the big data joint venture between storage giant EMC and VMware. The goal, according to the company, is creation of software applications that leverage "big and fast data" on a single, cloud-independent platform.

Pivotal develops the Pivotal Chorus analytic productivity platform for searching, exploring and visualizing data. Also on the product line card is the Pivotal Greenplum MPP database, the Pivotal HD distribution of Hadoop with SQL query services, and the Pivotal Gemfire distributed data management platform.


Top Executive: Founder, CEO Ben Werther

Platfora offers a big data analytics toolset that's native to the Hadoop platform, allowing users to directly analyze data in Hadoop without the need to build a separate data warehouse system. The software is offered for on-premise deployments or as a cloud service.

This month the San Mateo, Calif.-based company, founded in 2011, debuted a release of its software with a new programmatic query access for data scientists, and enhanced data visualization and discovery capabilities for line-of-business users.


Top Executive: Co-Founder, CEO Ashish Thusoo

Qubole develops a Hadoop-based big data platform, the Qubole Data Service, which runs in the cloud. Last month the company added to its lineup Facebook's Presto-as-a-Service query engine that provides real-time SQL capabilities on Hadoop.

Expectations are high for this Mountain View, Calif.-based company: Founders Ashish Thusoo and Joydeep Sen Sarma built and ran Facebook's data service and scaled it to more than 25 petabytes. They also created the Apache Hive open-source data warehouse technology.

Rubikloud Technologies

Top Executive: Co-Founder, CEO Kerry Liu

This year-old startup has developed a cloud-based, real-time data analytics platform for processing, analyzing and searching continuous streams of data. The company's stated mission is "to turn data into revenue."

Last month Toronto-based Rubikloud was voted "the most disruptive cloud startup" at The Cloud Factory conference in Banff. Working with $1 million in financing, the company is now working with early adopter customers with its product.


Top Executive: President, CEO Justin Barney

ScaleArc develops database infrastructure software the company says simplifies the way database systems are deployed and managed. The toolset provides IT managers with a view of database traffic and improves database scalability and availability through dynamic clustering, load balancing and sharding (horizontal database partitioning) capabilities.

ScaleArc, founded in 2009 and based in Santa Clara, Calif., recently began offering its software through the Marketplace for Rackspace Hosting, bringing database traffic management to cloud computing.


Top Executive: Founder, CEO Steve Sliwa

Seeq is developing software and services that help businesses derive insights from industrial process data, such as information collected from sensors and instrument systems, to aid with operational continuous improvement.

Founded in May 2013, the Seattle-based company raised $6 million in Series A funding in November.


Top Executive: CEO Amit Bendov

SiSense offers business intelligence and dashboard applications for analyzing and visualizing data collected from multiple sources. The company boasts that everyday business workers can use its products without the need for coding or help from the IT department.

Based in Tel Aviv, Israel, SiSense was founded in 2010 and has raised $14 million in financing.

Splice Machine

Top Executive: Co-Founder, CEO Monte Zweben

Founded in 2012, Splice Machine has been developing a full-featured, transactional SQL database on Hadoop that can run operational applications and real-time analytics using Hadoop data. The technology, which Splice Machine just began offering as a public beta, is designed to get around Hadoop's limitation of operating in batch mode.

San Francisco-based Splice Machine raised $15 million in Series B financing in February, bringing its total financing to $19 million.


Top Executive: CEO Mark Terenzoni

Sqrrl was quietly started in 2012, but the Cambridge, Mass.-based company got a lot of attention in the past year given that its founders came from the National Security Agency and helped develop that organization's massive database.

The Sqrrl Enterprise database offers column, graph and document store capabilities to power big data applications. The product's real forte is its ability to scale up and provide data security at the cell level. The 1.3 release in February offered security enhancements, performance improvements and new data storage and storage capabilities.

Sumo Logic

Top Executive: President, CEO Vance Loiselle

Sumo Logic brings big data analytics to IT management, calling itself "the next-generation machine data analytics company." Sumo Logic's software analyzes IT performance data in real time, providing actionable insights for IT operations, application management, and security and compliance managers.

Sumo Logic was founded in 2010 and is based in Mountain View, Calif. In April the company integrated its software with ServiceNow's IT service management cloud services, providing users with the ability to detect and remedy anomalous events in real time.


Top Executive: Co-Founder, CEO Andrew Cronk

TempoDB offers a database service specifically designed for time-series data, a problem that many databases have trouble handling. Time series data includes things like thermostat temperatures, heart rates and network latency statistics. The company has customers using its software for energy and smart grid management, server and network monitoring, and monitoring social media.

TempoDB was founded in 2011 and is based in Chicago.

Treasure Data

Top Executive: Founder, CEO Hiro Yoshikawa

Treasure Data offers a cloud-based data warehouse (data analytics Platform-as-a-Service) that operates on a subscription model. The idea is to provide sophisticated data warehouse capabilities to businesses without the huge costs and development times associated with on-premise systems.

The Mountain View, Calif.-based company was founded in 2011 and launched its service in 2012. Last year the company raised $5 million in Series A financing.

Via Science

Top Executive: CEO Colin Gounden

Via Science "applies big math" to solve complex analytics problems, according to the Cambridge, Mass.-based company. The vendor's core technology is its Reverse Engineering and Forward Simulation (REFS) software that automates big data and predictive analytics and runs on supercomputers such as IBM's Blue Gene/Q and cloud infrastructure such as Amazon's EC2.


Top Executive: Founder, CEO Christophe Bisciglia

Many organizations are adopting Hadoop to capture and manage big data. But what can you do with it? WidiData develops software that helps businesses develop predictive customer-facing applications on the Hadoop platform. That, according to WibiData, will help businesses derive more value from big data.

Bisciglia, founder of Hadoop software developer Cloudera, founded San Francisco-based WibiData in 2010. Last year the company snapped up $18 million in venture financing.


Top Executive: Co-Founder, CEO Yaniv Mor

Xplenty offers a cloud-based data integration service running on Hadoop, providing an alternative to using on-premise data ETL (extract, transform, load) tools to integrate structured and unstructured data. The startup is also competing against Amazon's Elastic Map Reduce Service.

The company, based in Tel Aviv, Israel, with a U.S. office in San Francisco, was founded in 2011.


Top Executive: President, CEO Jim Vogt

Zettaset is the creator of Orchestrator, software that businesses use to manage and secure their Hadoop big data clusters. The Mountain View, Calif.-based company was launched in 2009.

In February Zettaset said it had been granted a patent for its "split brain resistant fail-over" in high availability technology for Hadoop, a core component of Orchestrator.


Top Executive: Founder, CEO Justin Langseth

Zoomdata develops software that allows users to connect, visualize and interact with data through browsers and mobile devices. Companies use Zoomdata's software to create dashboards and connect them to disparate data sources.

Zoomdata was founded 2012 and is based in Reston, Va.