The past few months have seen a lot of exciting activity in the AI / ML and Big Data industry — including the merger of Cloudera and Hortonworks, the acquisition of BlueData by Hewlett Packard Enterprise (HPE), as well as ongoing innovation throughout the ecosystem. In this blog post, I’ll talk a little bit about all three.
The BlueData team is now officially part of HPE, and we’re continuing to develop our own exciting new innovations (stay tuned for more to come!). We’re also committed to providing our customers with the ability to deploy new versions and tools from throughout the AI / ML and Big Data ecosystem on the container-based BlueData EPIC software platform. This includes existing products and new innovations from our partners Cloudera (e.g. CDH) and Hortonworks (e.g. HDP), as well as their upcoming unity releases (e.g. CDP) as a combined company.
Along those lines, I’ve written previously about BlueData’s deep level of certification through the Hortonworks QATS (Quality Assured Testing Suite) program in my blog post here. QATS is Hortonworks’ highest certification level, providing rigorous testing across the full breadth of HDP services. It validates all the features, functions, and performance of the HDP cluster throughout the testing process; BlueData EPIC went through QATS certification for prior versions of HDP.
And now I’m excited to announce that BlueData was recently QATS certified for Hortonworks Data Platform (HDP) 3.0. HDP 3.0 is the first major HDP version change since HDP 2.0 (way back in 2013), and it’s packed with lots of new features and capabilities to drive additional data-driven insights for our joint customers. Here’s what Hortonworks had to say about HDP 3.0 in the GA announcement here:
“HDP 3.0 is a giant leap for the Big Data ecosystem, with major changes across the stack and expanded eco-system … Many of the HDP 3.0 new features are based on Apache Hadoop 3.1 and include containerization, GPU support, Erasure Coding and Namenode Federation.”
During our QATS certification for HDP 3.0, we drilled down into some of these features (like Namenode Federation, Erasure Coding, Cloud Storage & Enterprise Hardening, and Wire Encryption). We’re excited to bring these new capabilities to our joint customers, for HDP running in containerized environments on the BlueData EPIC platform. And, as cited previously in my colleague’s blog post here, it’s great to see the industry recognize the value of compute / storage separation and the use of containers for large-scale distributed data frameworks like Hadoop.
See below for some details on the QATS test coverage for HDP 3.0 and the certification process with our partners at Hortonworks (now Cloudera).
Components in Scope
HDFS 3.1.0, YARN 3.1.0, MapReduce2 3.1.0, Tez 0.9.1, Hive 3.1.0, HBase 2.0.0, Pig 0.16.0, Sqoop 1.4.7, Oozie 4.3.1, Zookeeper 3.4.6, Storm 1.2.1, Accumulo 1.7.0, Infra Solr 0.1.0, Atlas 1.0.0, Kafka 1.0.1, Knox 1.0.0, Ranger and Ranger KMS 1.0.0, Spark 2.3.1, Druid 0.12.1, Kerberos 1.10.3
General Scope of Testing
Functional, High Availability, Erasure Coding, Non-Secure, Secure with Kerberos, Wire Encryption, Ranger, and Transparent Data Encryption.
Key Testing Highlights
- Functional tests of HDP clusters running on BlueData EPIC (covering system components including Docker containers, storage, and networking).
- Functional testing of all HDP components (including HDFS, Hive, Hive with LLAP, HBase, and Spark2 with Kerberos authentication).
- Highlights of HDFS testing included Extended ACL; FSCK; Read and Write Data through httpFS; NameNode Federation and viewFS; Erasure Coding and Erasure Coded via WebHDFS API with Ranger-based authentication.
- Highlights of Hive and Hive LLAP testing included constraints and materialized views of queries and subqueries; ACID (Atomicity, Consistency, Isolation, and Durability) features; write, read, create, delete, update, etc. operations; atomicity and isolation on CRUD and insert-only transactional table; workload manager; support for NN federation; and setup of storage-based authorization for Hive.
- Integration and stress testing of different components to ensure that all the services are configured and integrated.
QATS Certification Process and Environment
All QATS tests were executed on a HDP 3.0 cluster running in Docker containers on the BlueData EPIC software platform (with BlueData EPIC version 3.6).
The screenshot below shows information about the HDP cluster that was tested, including the number of nodes and other details about the deployment. For example, the HDP 3.0 environment running on BlueData EPIC consisted of a four-node HDP cluster with 96 GB RAM, 10 cores, and 1 TB disk storage.
A sample of the services covered under the HDP 3.0 QATS test plan included HDFS, YARN, Hive, HBase, Spark, Kafka, Ranger, Atlas, and more – as shown in the list in the far left side of the Ambari screenshot below.
The HDP QATS certification runs on an Ambari node and has the following configurations for the database, log location, HDP cluster details, and other security considerations:
After validating each service, the QATS results are shown with detailed information about the tests covered, status, and the duration of the test run. For example, the following two screenshots show high-level information about some of the MapReduce and Hive tests:
In summary, the QATS testing for HDP 3.0 was rigorous and very thorough: with a total of approximately 2,800 service tests performed (with Kerberos, Ranger, Erasure Coding, and HA-enabled). Once again, BlueData passed with a success rate of 100% – and so BlueData EPIC earned the valuable Hortonworks QATS certification badge for HDP 3.0.
This new QATS certification for HDP 3.0 enables BlueData and Hortonworks (now Cloudera) to provide best-in-class support to our joint customers. In conjunction with the QATS program, our teams will continue to collaborate – across engineering R&D as well as our go-to-market plans – to ensure continuous product compatibility and high-quality customer service for HDP running on containers with the BlueData EPIC software platform.
Together, we make it easy to extract and analyze any kind of data to help organizations make data-driven decisions – with the ability to scale across more users, more data, and more analytics use cases – while ensuring enterprise-grade security and performance. Ultimately, our partnership and this certification make our combined solutions stronger and even better, to ensure the success of our joint customers.
To learn more about BlueData and Hortonworks Data Platform (HDP), you can read our joint solution brief.