10 Coolest Big Data Startups Of 2012 (So Far)

Big Data, Bigger Market

The rush to provide businesses with the IT to handle the ever-growing volume, variety and velocity of data didn't let up in the first six months of 2012. And, there's no sign of it slowing: The IT market research and analysis firm IDC predicts the market for big data technology and services will grow at a 40 percent annual rate from $3.2 billion in 2010 to $16.9 billion by 2015.

Here are 10 examples of cool big data startups from the first half of 2012. Some have been incubating awhile and had their big debuts in 2012. Others have been around for some time (we didn't go back further than 2010 to find startups) and made some big moves this year.

Compuverde

CEO: Stefan Berbo

Compuverde specializes in developing "green" big data storage systems for solution providers, telecommunications companies and businesses. The company, based in Karlskrona, Sweden, says its cloud-based Compuverde Object Store systems can manage more than 100 petabytes of unstructured data and operate with 99.999 percent uptime.

Object Store can reduce a business's hardware needs and capital expenditures and cut its energy consumption by up to 50 percent, according to the company.

DataSift

CEO: Rob Bailey

Twitter, Facebook and other social network sites generate a huge volume of information that can be a goldmine for businesses -- if they have a way of capturing and analyzing it. DataSift offers software that businesses use to define complex filters, based on such criteria as location, gender and even sentiment, to sort through billions of social interactions.

DataSift was born out of Tweetmeme and began offering its product in November 2011. In February, the San Francisco company inked a deal with Twitter, giving DataSift access to an archive of tweets going back to January 2010 for market research purposes. And in May, the startup raised $7.2 million in venture funding.

DataStax

CEO: Billy Bosworth

DataStax is a leader in developing big data systems based on the Cassandra "NoSQL" open-source database software and Hadoop. Cassandra, an Apache Software Foundation project like Hadoop, is designed to handle very large volumes of data distributed across large numbers of commodity servers.

The DataStax Enterprise platform, built on Cassandra, manages real-time analytic and search data all in the same cluster. The San Mateo, Calif.-based company launched release 2.0 of the platform in May, providing support for Hewlett-Packard Cloud Services. That was followed up with release 2.1 in June, which provided enhanced Hadoop capabilities and support for Oracle's Unbreakable Linux.

Hortonworks

CEO: Rob Bearden

Founded one year ago, Sunnyvale, Calif.-based Hortonworks debuted release 1.0 of its Hortonworks Data Platform in June. Built on Hadoop and other open-source software from the Apache Software Foundation, the Hortonworks Data Platform adds other tools and technologies that make it easy for businesses to implement a big data system and develop applications for managing and analyzing all that information.

Hortonworks is a spin-off of Yahoo's Hadoop engineering team. The company is widely seen as the chief competitor to the more established Cloudera, which offers its own Hadoop distribution. The company's name comes from the Dr. Seuss book "Horton Hears a Who," in keeping with the Hadoop elephant theme.

Karmasphere

CEO: Gail Ennis

Karmasphere this month is launching release 2.0 of its collaborative analytics software for extracting and analyzing data from Hadoop. The toolset lets users visually explore data to discover trends and patterns, analyze information using ad-hoc queries and then share the results with co-workers.

Cupertino, Calif.-based Karmasphere has partnered with virtually all the vendors and organizations with Hadoop distributions, including the Apache Software Foundation, IBM, Cloudera, Amazon Web Services and Hortonworks.

Mortar Data

CEO: K Young

Hadoop requires a fair amount of technical expertise with which to work. Mortar Data offers a cloud-based service, based on the Python programming language and the Apache Pig technology for analyzing huge data sets, which makes Hadoop accessible to a wider audience of programmers.

Mortar Data emerged from stealth mode this spring with an undisclosed amount of seed funding. The New York company's long-range plan is to work with technology partners to bring a range of business intelligence, analytics and advanced monitoring capabilities to the Mortar Data platform.

Paradigm4

CEO: Marilyn Matz

Paradigm4 is developing a massively scalable analytical database that the company said provides advanced mathematical and computing functionality for use with massive datasets. The company's software, based on the SciDB open-source database for scientists, is currently in limited beta testing.

The Waltham, Mass.-based company is attracting attention given that its CTO and co-founder is Michael Stonebraker, the noted database researcher who started Ingres, Illustra, Streambase, Vertica and other bleeding-edge data management companies.

Qubole

CEO: Ashish Thusoo

Qubole is developing what it calls an auto-scaling platform for analyzing and processing big data. The company's goal is to offer cloud-based Hadoop and Hive services that handle all infrastructure complexities behind the scenes, freeing up analysts to focus on developing queries and analyzing data.

The Mountain View, Calif.-based company exited "stealth mode" in June and is now recruiting businesses and scientists to participate in an early access program for its technology. The company's founders, Ashish Thusoo and Joydeep Sen Sarma, helped build Facebook's data infrastructure, were contributors to the development of Hadoop and created Apache Hive, an open-source data warehouse system.

Retention Science

CEO: Jerry Jao

Retention Science has developed what it calls the "Customer Profiling Engine," a big data marketing platform that helps online businesses analyze huge volumes of data to build customer loyalty and prevent customer churn. The startup's applications help e-commerce companies to predict how price-sensitive customers are and develop promotions accordingly, to identify where customers are in their "lifecycle" relationship with a business and create retention strategies, and to develop incentives for customers who are active on social networks.

Started in 2011, Santa Monica-based Retention Science officially launched this week with $1.3 million in seed funding from multiple venture capital and angel investor sources. The company is allied with MuckerLab, a Los Angeles technology incubator company.

Think Big Analytics

CEO: Ron Bodkin

Think Big Analytics positions itself as a big data solutions provider. The Mountain View, Calif.-based company works with Cloudera, DataStax, Hortonworks and other vendors, using their technologies to assemble big data systems for their clients.

The company offers a range of big data-related consulting, engineering, development and training services and has pre-built frameworks in data warehousing, advertising analysis and auto device support.