One of the most challenging aspects of Big Data deployments is keeping up with the dynamic nature of Big Data frameworks, distributions, applications, and their latest versions. The success or failure of a Big Data implementation may hinge on how well the organization handles support for the menagerie of applications and tools that data scientists, developers, analysts, and engineers want to use.
Finding the right set of Big Data expertise and implementing best practices for these applications is both difficult and time consuming. It’s even more difficult due to the constant pace of Big Data innovation, as new versions and tools are released.
Speaking of new releases … we just announced the new summer release for the BlueData EPIC software platform today. And fortunately, our new software release introduces innovative new capabilities and functionality to help enterprises keep up with the pace of innovation in the Big Data ecosystem.
App Store and App Workbench
BlueData provides a Big-Data-as-a-Service software platform powered by Docker containers. The BlueData EPIC platform simplifies Big Data deployments and enables data scientists and analysts to spin up self-service Hadoop and Spark clusters, within minutes. Two key components of the solution are the “App Store” (which provides sample Docker images for common Big Data frameworks, applications, and tools) and the “App Workbench” (which allows administrators to manage, create, and update the preferred applications and versions for their end users).
With the new EPIC summer release, we updated several distributions and applications that are provided as pre-configured Docker images in the App Store – including Cloudera, Hortonworks, MapR, Spark, Cassandra, Kafka, Splunk, and more – and these images can be installed via one-click deployment.
We also made significant enhancements to the App Workbench to allow administers to easily modify and update the base Docker images in their own App Store – or create new images for other applications and tools. In this blog post, I’ll describe how to use the App Workbench and in doing so I’ll highlight some of this new functionality.
The App Workbench focuses mainly on the following three use cases:
- Modify or upgrade an existing Hadoop or Spark distribution in the App Store. For example:
- Modify Cloudera CDH version 5.4.3 and add a security patch to the base image
- Create a new CDH version 5.5.1 image starting from existing CDH 5.4.3 image
- Add a new application as an edge node for Hadoop or Spark with auto-provisioning. For example:
- Splunk/Hunk could be one team’s preferred tool for operational analytics; those end users may want “Splunk on Hadoop (Hunk)” as an edge node for new Hadoop deployments
- Users with a cluster dedicated for ETL may want Talend as an edge node, pre-wired for immediate use
- Create new images for Big Data applications and frameworks. For example:
- Data science teams often need for tools beyond Hadoop and Spark – they’ll want to add Kafka, Cassandra, and other applications like H2O for their testing, development, prototyping, and experimentation
- A user running Spark on YARN in a Hadoop cluster may be interested in trying Spark 1.6 or Spark 2.0 standalone – and they may want to add tools such as Zeppelin or Jupyter notebooks and Spark Job Server to their Spark clusters
With the BlueData EPIC platform and our App Workbench, all of the above scenarios can be supported seamlessly. It’s easy for an organization to maintain and run multiple applications and tools (and multiple versions) in parallel, to support a wide variety of Big Data use cases.
How the BlueData App Workbench Works
The BlueData App Workbench (aka “bdwb”) is a CLI framework, written in Python, that provides a rich set of APIs, macros, and a shell to:
- Create Docker images from Dockerfiles created using BlueData base images
- Orchestrate the run time environment for single and multi-node deployments
- Package and load images into the BlueData EPIC App Store (i.e. new catalog entries)
As shown in the command line screenshot below, “bdwb” supports an interactive shell; inline help for commands and subcommands; autocomplete and contextual help; and commands to build images, configure Docker instances, define new catalog entries and options, add a logo, and much more.
To add a new application or modify an existing one, users can run these commands interactively, or create a “.wb” file containing a series of commands. Much like a python script, running the “.wb” runs all the commands in the file.
Below is a sample file. Note the “#!/usr/bin/env bdwb” in the first line:
The following screenshot shows the App Store in the web-based BlueData EPIC interface, with corresponding binaries from a specific BlueData installation:
Now I’ll review some of the new App Workbench features using two concrete use case examples:
- Create a new Cloudera CDH 5.5.1 catalog entry in the App Store starting from an existing CDH 5.4.3 catalog entry
- Create a new Kafka image using a set of App Workbench commands
Create a new CDH 5.5.1 catalog entry from existing CDH 5.4.3 entry
For each customer installation of BlueData EPIC, the list of distributions and applications available to end users in their App Store is controlled by the site administrator. Data scientists and developers can create new images in a development environment and then transfer the final bin file to the site administrator for installation.
The steps to create a new image of CDH 5.5.1 from an existing template CDH 5.4.3 are outlined below:
- Create a source directory for the new image. In this example, we’ll use the directory /root/src/CDHImageUpgrade
- Extract the source files from the existing CDH 5.4.3 image into this directory. With BlueData EPIC, Dockerfiles and orchestration scripts are bundled into each image bin file as shown below.
- Modify the Dockerfile to replace the 5.4.3 parcel links with 5.5.1 parcel links.
- Modify the distro id, name, and description to match the CDH 5.5.1 details.
- Create a new image file by simply running the “.wb” file (Example : > ./cdh54cm.wb). This will create the Docker image file and create the “.bin” file for the distribution as shown below.
- Copy the new bin file into the “srv/bluedata/catalog” directory on the BlueData EPIC controller host.
- From the BlueData EPIC web-based user interface, click on the new image’s “install” button in the App Store. Within minutes, the image will be ready for use. The create cluster screen will now display the new CDH 5.5.1 distribution as shown below:
Create a new Kafka image using a set of workbench commands
Some use cases require our customers to create new application images from scratch and add them to the App Store. To do this, the user needs to understand how the application would be manually deployed and how it typically integrates with the rest of their Big Data ecosystem.
They could then use this knowledge of the application to create a Docker image and automate deployment with BlueData EPIC. In this example, we’ll show how that would work for creating a new Kafka application image.
In general, developers need to follow the steps below:
- Create utility containers in BlueData EPIC, the install and validate the application
- Create a Dockerfile using manual instructions and generate init.d scripts for the services
- List the commands to be executed in the “wb” file, shown below for Kafka
- Run the wb file
The steps above generate a Docker image, setup scripts, and a bin file for application deployment. If an application requires advanced features in the setup scripts, users can always edit scripts manually and repackage them using the interactive shell or wb files.
The screenshot below demonstrates the creation of a Kafka bin file from scratch. Running this script file generates the “bin” for a Kafka 0.8.1.1 image.
That’s it. It wasn’t so hard, was it? By following these simple steps, our customers can have their their own App Store populated with Docker images for the latest versions of the Big Data applications and tools that their users want – providing the ultimate in flexibility, choice, and configurability.
And once these images have been added to the App Store, their data scientists and developers can instantly spin up ready-to-run clusters for these tools (with pre-built security, SSH, networking, and remote data access) using BlueData EPIC’s self-service interface.
The App Store and App Workbench are integral components of the BlueData EPIC software platform. We use the App Workbench internally to add new applications to the App Store, and we’re working with many of our ecosystem partners and customers to ensure that this process is simple and extensible to support a wide variety of use cases.
Big Data is a dynamic and rapidly evolving space, with new applications and new versions emerging constantly. Here at BlueData, we’re committed to helping our customers keep up with these innovations – to arm their data analysts, developers, and data science teams with all the tools they need for discovering new business insights, uncovering new opportunities, and delivering competitive advantage with Big Data analytics.