Just about every Hadoop-powered Big Data initiative has its own set of business objectives and thus requires a distinct environment. Fortunately, there are several infrastructure options to choose from.
Hadoop clusters can run on physical or virtualized servers, and in public or private clouds. A virtual Hadoop cluster is when the Hadoop distribution is installed within the context of a collection of virtual machines (VMs). For enterprises that want a development environment that provides maximum agility and speed so users can “fail fast” and move on to the next project, a virtualized environment makes sense. And if you don’t have virtually limitless financial and human resources, the cloud should be your Big Data infrastructure of choice. So for most of us, the choice we make isn’t traditional or cloud, but rather public or private.
Public clouds do have their benefits. They are generally quite scalable and can be affordable, but the underlying, physical infrastructure is shared, and offers no real control over the security or availability of data, and no visibility into exactly where data resides.
Private clouds offer the highest level of control, but historically have required hands-on administration as well as initial capital and ongoing investments. Big Data democratizers are focused on helping organizations without limitless resources unleash the power of Big Data. At BlueData, we’ve developed a re-envisioned infrastructure layer that changes how Big Data jobs are provisioned, run and managed in a private cloud environment. It provides the benefits of virtualization and private clouds— efficiency, utilization, multi-tenancy, elasticity and policy based automation—without compromising on performance.
Before making a Big Data infrastructure investment for an upcoming project, it’s important to determine the best parameters for your particular environment. I recommend asking these seven questions to get started.