Back to News

BlueData Adds Deep Learning, GPU Acceleration, and Multi-Cloud Support for Big Data Workloads on Docker Containers

Leading Big-Data-as-a-Service Solution Extends Availability to Microsoft Azure and Google Cloud Platform

Santa Clara, Calif.— BlueData®, provider of the leading Big-Data-as-a-Service (BDaaS) software platform, today announced the new fall release for BlueData EPIC™ and introduced initial availability on Google Cloud Platform (GCP) and Microsoft Azure. This release adds new innovations and options for running Hadoop, Spark, and other Big Data workloads on Docker containers – delivering on the requirements from its rapidly growing customer base, including many of the world’s largest enterprises across multiple industries.

In recognition of its innovations for running Big Data workloads in containerized environments, BlueData just won the 2017 Datanami Editors’ Choice Award for Best Big Data Product: Virtualization. This new fall release builds on the innovative functionality introduced in BlueData EPIC version 3.0, with support for deep learning use cases, GPU support, and flexible container placement policies. And by extending availability of BlueData EPIC from Amazon Web Services (AWS) to Azure and GCP, BlueData is the first and only BDaaS solution that can be deployed on-premises, in the public cloud, or in hybrid and multi-cloud architectures.

Deep Learning, GPU Support, and Flexible Container Placement
Earlier this year, BlueData added new capabilities to bring DevOps agility to distributed data science operations and machine learning use cases. The new fall release of the BlueData EPIC software platform provides the ability to jumpstart an even broader range of applications and use cases, including deep learning:

  • Streamlined operations for deep learning projects: With BlueData EPIC, data science teams can quickly get started in deep learning without the operational overhead of setting up, configuring, and managing these new environments. They can leverage pre-integrated Spark clusters, action scripts (e.g. to update all the nodes in a running environment with a single click), and web-based notebooks (e.g. JupyterHub, RStudio Server, Zeppelin) to automate the end-to-end lifecycle of data science operations.
  • Support for GPU acceleration and TensorFlow: BlueData can now support clusters accelerated with Graphics Processing Units (GPUs), and provide the ability to run TensorFlow for deep learning on GPUs or on Intel architecture CPUs. By leveraging the advanced host tagging feature introduced in this release, administrators can specify placement of Docker containers running TensorFlow on infrastructure configured with GPUs or CPUs either in the public cloud or on-premises. 
  • BigDL for distributed deep learning on Spark: The fall release now includes a pre-integrated application image for Intel’s BigDL running on Docker containers. BigDL is a Spark-based framework for deep learning optimized for Intel CPU architecture. With BigDL, BlueData now offers a fast and economical path to deep learning by utilizing existing x86-based server infrastructure and the pre-integrated Spark clusters that BlueData EPIC provides out of the box.

This new release also brings new capabilities for container placement as well as additional enhancements in performance, monitoring, and security. Some of these features and benefits include:

  • Flexible container placement policies: Within BlueData EPIC, administrators can now define various roles for a given application image and control the placement of containers associated with a specific role to specific hosts. For example, containers with the Spark worker role can be placed on servers or instances containing a large amount of memory or a local SSD for fast storage access.
  • Improved utilization and performance for Big Data workloads: With new purpose-built features for flexible cluster role definition and host tagging, administrators can ensure that the right Big Data workload is assigned to the right underlying host. This in turn can help to optimize infrastructure utilization, allow for greater performance optimizations, provide better control over SLAs, and enable chargeback models for infrastructure consumption.
  • Support for Intel cache acceleration and SSD technology: BlueData now enables customers to leverage the power of Intel Cache Acceleration Software (CAS) and Intel Optane Solid State Drive (SSD) technology to further improve performance for Big Data jobs by maximizing the performance of local disk storage. With Intel CAS and high-performance SSDs, enterprises can reduce costs for latency-sensitive workloads and improve overall data center TCO.
  • Enhanced container-level monitoring: In the last release, BlueData introduced a new pluggable framework based on Elasticsearch, Metricbeat, and Kibana to provide fine-grained monitoring for CPU, memory, and other key metrics. Now, using this same framework, BlueData EPIC includes detailed monitoring for container-level disk I/O and network throughput.

Additional details and other new enhancements are highlighted in the accompanying blog post here. The fall release of BlueData EPIC will be generally available in October 2017.

First Big-Data-as-a-Service Solution for Hybrid and Multi-Cloud Deployments

Effective immediately, BlueData is also announcing directed availability for Azure and GCP. BlueData announced a similar directed availability program for AWS last summer, with general availability for AWS introduced in December. And this past spring, BlueData added support for Big-Data-as-a-Service in a hybrid architecture with a unified solution across both on-premises and AWS environments.

With this new directed availability program, BlueData can support standalone deployments on all three major public cloud services as well as hybrid deployments where some hosts are on-premises and some hosts (i.e. virtual machines) are on the cloud with network connectivity. By using host tags, customers can leverage a single hybrid deployment to place Big Data clusters on on-premises hosts (either physical servers or virtual machines) or on cloud-based virtual machines – on AWS, Azure, and/or GCP.

The BlueData EPIC software platform will be licensed for Azure and GCP with monthly or annual subscription pricing. Organizations interested in trying BlueData EPIC on Azure or GCP for free during the initial directed availability period can apply here. General availability is targeted for early 2018.

“BlueData continues to introduce new innovations for deploying Big Data workloads in Docker containers, including support for the growing ecosystem of data science and deep learning tools,” said Kumar Sreekanti, CEO of BlueData. “And by extending from on-premises to AWS and now Google Cloud Platform and Microsoft Azure, we’re delivering on our vision of running on any infrastructure – whether on-prem or cloud. BlueData is the only Big-Data-as-a-Service solution for enterprises that want a hybrid and multi-cloud deployment.”

You can see a demo of the new BlueData EPIC fall release as well as support for Azure and CGP at the Strata Data Conference in New York City this week, September 26th to 28th at booth 433. You can also hear a customer case study session from Barclays UK highlighting their deployment of BlueData EPIC and Dell EMC infrastructure, on Wednesday September 26th at 1:15pm: Enabling Data Science Self-Service through an Elastic Data Platform.


Supporting Resources
Blog post: Announcing the BlueData EPIC Fall Release and Multi-Cloud Support
Blog post: Deep Learning with TensorFlow, GPUs, and Docker Containers
Blog post: Deep Learning with BigDL and Apache Spark on Docker
Video: Nasdaq Customer Testimonial


About BlueData Software, Inc.
BlueData is transforming how enterprises deploy their Big Data applications and infrastructure. The BlueData EPIC™ software platform uses Docker container technology to make it easier, faster, and more cost-effective for enterprises of all sizes to leverage Big Data – enabling Big-Data-as-a-Service either on-premises, in the cloud, or in a hybrid architecture. With BlueData, they can spin up virtual Hadoop or Spark clusters within minutes, providing data scientists with on-demand access to the applications, data, and infrastructure they need.  Based in Santa Clara, California, BlueData was founded by VMware veterans and its investors including Amplify Partners, Atlantic Bridge, Dell Technologies Capital, Ignition Partners, and Intel Capital.  To learn more about BlueData, visit or follow @bluedata.