This morning, we announced the summer release of the BlueData EPIC software platform: version 1.5. With this new software release, BlueData is delivering on our mission to simplify and streamline Big Data infrastructure.
BlueData uses virtualization technology to help solve Big Data infrastructure challenges and enable Hadoop-as-a-Service in an on-premises deployment model. Incorporating feedback from our customers and partners, we’re continuing to make it easier, faster, and more secure to deploy Big Data in a virtualized environment – allowing enterprises to accelerate their time-to-insights while delivering greater operational efficiency and cost-savings.
The new functionality for this release can be grouped into three key themes:
- Interoperability with Big Data tools of choice – with support for newer Hadoop and Spark versions, integration with Apache Ambari and Cloudera Manager, and support for common Big Data analytics applications.
- Data scientist and Big Data developer productivity – by providing “bring your own app” capabilities that allow users to quickly snapshot and self-register their own Big Data applications of choice.
- Flexible security models for multi-tenant Hadoop – with enhanced security and authentication using Kerberos.
I’ll briefly talk about each of these themes and some of the example new functionality in each area.
Harnessing Big Data Innovations
The Big Data ecosystem is moving at a breakneck pace. There are always new tools to experiment with every week: whether it’s the latest in the Hadoop and Spark ecosystem or new applications for analytics, data preparation, data visualization, and search. The ability to understand and harness these innovations in the context of your existing data can result in valuable new insights and competitive advantage.
Here at BlueData, we don’t stop at delivering the best infrastructure software platform for Big Data. We go the extra mile by offering a catalog of images with the most common open source distributions for Hadoop and Spark (e.g. CDH 5.0, CDH 5.2, HDP 2.0, Spark 1.0.0) and more. That means you can get up and running with virtual Big Data clusters in minutes using just a few mouse clicks – instead of taking days or weeks, with manual labor-intensive infrastructure provisioning processes and potentially thousands of mouse clicks.
With BlueData EPIC version 1.5, our Big Data catalog adds support for:
- CDH 5.3 and HDP 2.2: Adding to our support for earlier versions of CDH (Cloudera Distributed Hadoop) and HDP (Hortonworks Data Platform).
- Spark 1.3.1: Support for Spark 1.31 – the most recent stable release of Spark –offers several enhancements around Spark Core and Spark SQL.
- Cloudera Manager: By provisioning the Cloudera Manager console with CDH, administrations can monitor their CDH environment and end users can easily add new Hadoop services (e.g. Solr, Accumulo). You can learn details about the BlueData integration with Cloudera Manager in our earlier blog post here.
- Apache Ambari: Apache Ambari is the open source management console for HDP. Our integration enables self-service provisioning and monitoring of HDP clusters with Ambari. I’m giving a talk on this at Hadoop Summit next week – join me at 1:45pm on Tuesday June 9th.
Productivity and Self-Service Agility
With BlueData EPIC 1.5, we’ve also introduced new features that help improve the productivity of data scientists, analysts, and developers.
- Utility Virtual Nodes for Big Data: With BlueData EPIC, data scientists and developers can now self-provision a cluster of utility nodes with just a few clicks. Unlike ‘vanilla’ virtual machines, these virtual nodes are purpose-built with considerations around multi-tenant security, networking and clients to access remote data stores (e.g. HDFS). For example, these virtual nodes:
- Are isolated to the specific tenant in which they are provisioned
- Have access to other clusters in the tenant so they act as ‘edge’ nodes
- Have access to all the remote data stores that are available to the tenant
- Are pre-installed with software packages such as Java, Python, etc.
This helps data scientists and developers (as well as administrators) to streamline the process for installing and evaluating new distributed applications and Big Data analytics tools – without the hassle of provisioning a new physical server or starting with a vanilla virtual machine.
- Add-on Images: If there are applications (e.g. business intelligence, ETL, or search tools) that you use frequently for your Big Data analysis, why not just have them available as a self-service option so that you can provision them with your new Hadoop or Spark cluster? With EPIC 1.5, we provide an easy process to snapshot your existing Utility Virtual Node and register this virtual node as an Add-on Image.
So in addition to our catalog of pre-integrated and pre-configured applications, users can easily add their own tools of choice. It’s like providing a shelf full of popular, best-selling books – along with an empty shelf and the means by which to easily add to the library with your own favorite titles.
Enhanced Security and Control
Big Data security is a top priority for BlueData and we’re working to provide end-to-end protection, access, compliance and auditing of both data in motion and data at rest. But that’s a big topic and beyond the scope of today’s blog. Rest assured, you will hear more on this from us in the coming months.
But for starters, we’ve heard loud and clear from our customers that they want Kerberos-based authentication support. So with the new 1.5 release, BlueData EPIC now includes Kerberos support at the compute and storage layers independently:
- Hadoop ‘Compute’ Clusters: Self-service clusters created in BlueData are virtualized ‘compute’-only clusters with services such as Resource Manager, Node Manager, Impala, HiveServer2, etc. These services can be now be Kerberized, using management consoles and APIs available in Cloudera Manager or Apache Ambari.
- Remote HDFS Storage: With BlueData, the Hadoop ‘compute’ clusters can run jobs against data stored in one or more remote data stores (also known as DataTaps), some of which might be enabled with Kerberos authentication. In EPIC 1.5, BlueData DataTaps can be configured to access Kerberized DataTaps as a trusted user.
With these enhanced security and authentication capabilities, our customers are able to significantly accelerate their ad-hoc analytics and development/testing for Big Data. By creating Kerberized ‘compute’ virtualized clusters of newer versions of Hadoop and using DataTaps to access their existing Kerberized production HDFS cluster as a data store, our customer can avoid costly and time-consuming cycles to set up physical infrastructure, install newer version of Hadoop (including setting up Kerberos), and copy data from production HDFS.
To learn more about what’s new in EPIC version 1.5, watch this video where I demonstrate each of these new capabilities:
But wait … there’s more. This morning we also introduced a new ‘lightweight’ edition of BlueData EPIC running on Docker containers: BlueData EPIC Lite. Data scientists and developers can use EPIC Lite as a personal sandbox to create multi-node Hadoop or Spark clusters on their laptop. It’s a great way to get started with EPIC and you can download it for free: www.bluedata.com/free
Whether you’re developing innovative data-driven products with Hadoop and Spark, or you’re an analyst using business intelligence tools to uncover insights from Big Data, or you’re a data scientist working on creating a new machine learning algorithm, you need the systems and infrastructure to do your job.
But as many organizations have found, the infrastructure for Big Data is exceedingly complex. It can be time-consuming and expensive to implement in an enterprise environment. What they need is an “easy button” to reduce the complexity and cost of deploying on-premises Big Data infrastructure.
With the new summer release of the BlueData software platform, we’re continuing to help simplify Big Data infrastructure and analytics. It’s EPIC.
– By Anant Chintamaneni, VP of Products, BlueData