Hortonworks will begin shipping a significant new release of its flagship data management software in the third quarter, adding containerization capabilities for faster application deployment and support for machine learning workloads.
The Hortonworks Data Platform (HDP) 3.0, which bases many of its new enhancements on the Apache Hadoop 3.1 software, was unveiled Monday at the Hortonworks DataWorks Summit in San Jose.
"Everybody's trying to make this move to the cloud. Our focus has been on how to make that easier," said Scott Clinton, Hortonworks vice president of product marketing, in an interview with CRN.
[Related: The 2018 Big Data 100]
Hortonworks also announced expanded alliances with Microsoft, Google and IBM around big data analytics and more support for hybrid system deployments.
"The ability to support multiple hyper-scale cloud systems is critical for the ability to move data and applications to any cloud," Clinton said of the alliances with the major vendors.
Hortonworks said it has renewed and extended its six-year joint engineering and go-to-market relationship with Microsoft in a move the companies say will make it easier to move big data workloads to the cloud.
Specifically, Hortonworks said HDP, as well as its Hortonworks DataFlow (HDF) real-time data analytics systems and Hortonworks DataPlane Service, can be deployed natively on Microsoft Azure Infrastructure-as-a-Service platforms. The Hortonworks software can also be deployed on Microsoft Azure HDInsight, Microsoft's big data platform that's based on HDP.
Clinton said the bottom line is that businesses and organizations can deploy Hortonworks software in the cloud with the same capabilities as on premise.
(Hortonworks DataPlane Service, introduced in September, manages, governs and secures data and workloads across multiple data sources, supporting multiple types of data residing on premise, in hybrid systems or across multiple clouds.)
"Our customers are increasingly adopting a hybrid data architecture, as cloud deployments offer excellent use cases for ephemeral analytic workloads," said Hortonworks CEO Rob Bearden, in a statement. "With the option to deploy HDP, HDF or DPS workloads on Azure IaaS, or use HDInsight, customers can take whichever path to the cloud that fits their business needs best."
Under the Google partnership, HDP and HDF have been optimized to run on the Google Cloud Platform through integration with Google Cloud Storage, making it easier to run big data analytics workloads in hybrid cloud environments, Hortonworks said.
Automated cloud provisioning simplifies the deployment of HDP and HDF in Google Cloud Platform. Running HDF on Google Cloud Platform also makes it easier to deploy a hybrid data architecture, facilitating data flows between any on-premise source and the Google Cloud system.
And under the expanded partnership with IBM, the companies introduced IBM Hosted Analytics with Hortonworks (IHAH), an integrated system running as a service on the IBM Cloud. IHAH is a complete data management and analytic system that incorporates Hortonworks' HDP and IBM's Big SQL data warehouse system and Data Science Experience data science platform.
The expanded vendor relationships mean that solution providers within each vendor's partner ecosystem can better leverage the Hortonworks platform, Clinton said. It also makes it easier for solution providers to provide services across multiple cloud systems.
Of the enhancements to HDP 3.0, the support for containerization is perhaps the most significant. Containerization makes it easier to deploy applications on HDP without re-writing them, which means it's easier to move new applications and workloads to cloud systems, according to Clinton.
The containerization capabilities offer partners the opportunity "to help customers understand the hybrid-cloud equation" and provide cloud migration services, Clinton said.
HDP 3.0 supports GPU-based systems, making it possible to run compute-intensive workloads such as artificial intelligence and machine learning applications.
The new edition sports a real-time database, enabled by Apache Hive 3.0, that speeds up HDP query performance by processing more data at a faster rate. It also offers enhanced security and governance capabilities to meet General Data Protection Regulation (GDPR) requirements.