I’ve worked with dozens of our customers and partners over the past few years here at BlueData, but our partnership with Dell EMC – and the joint customer relationships where we’ve worked together – has been particularly exciting.
We’re fortunate to count Dell EMC as one of our strategic investors, as was disclosed recently. And as demonstrated by a recent webinar where I co-presented with Dell EMC, and a recent guest blog post on our site by the Dell EMC CTO for Analytics, we’ve been working closely with the product, services, and sales teams across Dell EMC in many areas.
Now Dell EMC is delivering a powerful and innovative new solution, the Dell EMC Elastic Data Platform, that includes BlueData’s EPIC software. I’ve been able to spend some time working on-site with one of our joint customers, Barclays in the UK, to implement this new solution (more on this later); so I’m thrilled that it’s now being introduced to other Dell EMC customers across the globe.
The Big Data Challenge
In my discussions with many enterprise organizations, I often see them start out their Big Data initiatives on a small scale with a data lake and Hadoop deployed in the traditional fashion: on bare-metal servers with direct-attached storage.
As these organizations expand their deployments to respond to new analytics and data science use cases from different lines of business, they start to experience cluster sprawl and the associated IT management overhead becomes increasingly complex and inefficient. Compute utilization for their infrastructure is often less than thirty percent; storage can’t be scaled independently; data duplication introduces new risks and costs; and it can take weeks for them to respond to ongoing requests for new clusters, new tools for analytics and data science, or upgrades. The graphic below illustrates some of these pain points.
From a Big Data infrastructure standpoint, the challenge is two-fold: buying high-end servers to keep up with the storage demands costs is too expensive, and the time required to maintain all those bare-metal servers reduces efficiency. So our friends at Dell EMC designed a cost-effective and scalable solution to address this challenge.
Elastic Data Platform
The Dell EMC Elastic Data Platform is a powerful and flexible approach to help these enterprises get the most out of their existing Big Data investments – with Dell EMC infrastructure, software from BlueData as well as BlueTalon for data-centric security, and Dell EMC Professional Services.
Building on the initial success of existing data lake deployments, organizations can leverage a highly scalable, elastic, and multi-tenant architecture to provide their users with on-demand access to a variety of Big Data analytics and data science workloads (e.g. Hadoop, Spark, machine learning, and other use cases) – helping to support their ever-growing and evolving business needs. It delivers fast and easy provisioning, simplified deployments, cost-efficiency, and assurance that governance and security requirements are being met.
There are five key principles behind the Elastic Data Platform:
- Easy Data Provisioning: Provide read-only access and scratch pad data to anyone within the organization while preventing data sprawl and duplication.
- Tailored Work Environments: Isolate environments between users to ensure data integrity and reliable compute performance tailored with a variety of tools for many different workloads – assuring quality of service.
- Scalability: Ensure the compute environment performs elastically and scales horizontally to meet business demands and deliver high quality of service.
- Data Security: Enhance security, governance, and access controls while maintaining ease of use.
- Cloud Ready: Establish an on-premises model while preparing for a hybrid on/off-premises solution with cloud services.
The solution itself is based on four key capabilities:
- Deploying Big Data Environments: The BlueData EPIC software platform provides the ability to create elastic, multi-tenant Big Data environments for data science and analytics using Docker containers – whether on-premises, in the cloud, or in a hybrid architecture.
- Separating Compute and Storage: When aggregate datasets grow larger than a few hundred terabytes, it makes sense to separate compute from storage to allow both to scale independently. For those solutions needing scale out storage, Dell EMC Isilon offers a compelling ROI and ease of use with scalability.
- Enforcing Centralized Policy: BlueTalon provides the consistent creation and enforcement of data access policies across environments supporting a diverse set of users, tools and data systems.
- Automating and Integrating: Dell EMC Professional Services have automated the deployment of the above components and provided an open and flexible set of interfaces to integrate into existing Big Data environments.
You can learn more in the overview of the Dell EMC Elastic Data Platform posted at www.emc.com/collateral/service-overview/h16643-elastic-data-platform-service.pdf
Learn More at the Strata Data Conference in New York
If you’re going to the Strata Data Conference next week (September 26-28) in New York, stop by the Dell EMC, BlueData, or BlueTalon booths in the expo hall and talk to our teams about the Elastic Data Platform.
And make sure you mark your calendar for the case study session with Barclays UK on “Enabling data science self-service with the Elastic Data Platform” at 1:15pm Wednesday September 27th. Find out how Barclays UK implemented the Elastic Data Platform to provide their data scientists with the ability to provision their own environments and scale them up or down – using BlueData EPIC to deploy Cloudera and Spark on Docker containers. You’ll learn about the Barclays Big Data journey, the architecture for their solution, and the underlying technologies.