Big Data Startup Trifacta Readies Next Software Release

Big data startup Trifacta will ship the second release of its data transformation platform by year's end, adding new visual data profiling capabilities and support for the Apache Spark processing engine -- the latter through an alliance with Databricks.

The Trifacta Data Transformation Platform v2 also offers support for an expanded range of complex data formats.

San Francisco-based Trifacta is among the wave of startup companies developing technology for managing big data, and more specifically working with the Hadoop open-source big data platform. The company's software is used to transform raw, complex data in Hadoop into structured formats that can be used by business analytics tools such as Tableau.

Related: Gartner; 5 Of The Biggest Big Data Myths Debunked

Sponsored post

Founded in 2012, Trifacta has raised $41.3 million in three rounds of funding.

The company has tight relationships with Hadoop vendors such as Cloudera, Hortonworks and Pivotal. "We really go deep in this Hadoop space," said Joe Hellerstein, founder and chief strategy officer of Trifacta, in an interview.

Trifacta is in the early stages of developing a channel for its software. Hellerstein said the company has been in talks with systems integrators who work in the Hadoop area, but he declined to identify them.

A key element of the v2 release is its support for Spark, the open-source, general-purpose engine for large-scale data processing. Spark provides an alternative to the batch-oriented MapReduce data processing technology used in many Hadoop implementations.

To build Spark support into its software, Trifecta partnered with Databricks, the company founded by Spark's original developers and that provides a commercial version of Spark. The Databricks certification means the Trifacta Data Transformation Platform v2 will run on certified Spark distributions including those from Blue Data, DataStax, Guavus, Hortonworks, IBM, Oracle, Pivotal, SAP and Stratio.

The new advanced data profiling technology makes it easier for users to understand a data set's characteristics as they manipulate the data for analysis. The new release also provides native support for such complex data formats as JSON, Avro, ORC and Parquet.