Back to Blog

Break the Cycle of Deploying Unwieldy Hadoop Infrastructure

This is a guest blog courtesy of Chris Harrold, CTO of Big Data Solutions at EMC.  The content originally appeared on the EMC blog site here.


We are in a new data-driven age. With the rise in adoption of big data analytics as a decision-making tool comes the need to accelerate time-to-insights and deliver faster innovation informed by these new data-driven insights.

You know what? That’s a lot of mumbo-jumbo. Let’s boil it down to the real issue for IT: the tools that analysts and data science professionals need were not really designed to be enterprise-friendly, and they can be unwieldy to deploy and manage. Specifically, I’m talking about Hadoop. Anything that requires the provisioning and configuration of a multitude of physical servers (that are exactly the same) is always going to be the enemy of speed and reliability. More so when those servers operate as a stand-alone, single-instance solution, without any link to the rest of the IT ecosystem (the whole point of shared nothing). Shared nothing may work for experimentation, but it is a terrible thing to build a business on and to support as an IT operations person.

How do I know this? Because I have been that guy for 25 years!

In order to bridge the gap between data science experimentation and IT operational stability, new approaches are needed to provide operational resiliency without compromising the ability to rapidly deploy new analytical tools and solutions. This speed in deployment is essential to support the needs of developers and data scientists. But the complexity and unwieldy nature of traditional Hadoop infrastructure is a major barrier to success for big data analytics projects.

Consider these questions and see if they sound familiar:

  • Do you struggle with under-utilized resources in your big data analytics clusters?
  • Do you continually try and balance the growth in compute and storage for your Hadoop cluster environment?
  • Do you want to be able to extend your analytics toolbox beyond Hadoop but don’t want to manage the infrastructure that goes with it?

There are better ways to operationalize Hadoop and to achieve the functionality you need for the business, without sacrificing operational consistency and the ability to create and reconfigure new big data tools on demand. There are much more effective ways to deploy your big data infrastructure and manage your tools. There is a way to avoid being caught in the continual trap of rebalancing and rebuilding your Hadoop platforms over and over again.

EMC-BigData-Webinar-v1-2

EMC is fortunate to share this vision with BlueData and we have the same goals in mind: creating operational, enterprise-ready Hadoop using the time-tested principles of shared storage and virtualized infrastructure. Our Big Data team invites you to a BrightTalk webinar on December 8th to discuss this vision, explore solutions to the challenges outlined above, and share real-world examples from our customers’ deployments.

Join the webinar here or by clicking the BrightTalk registration link below: