Back to Blog

Self-Service Hadoop and Big Data Analytics

Back in the day, I remember my father bringing me to Horn & Hardart’s: the iconic, innovative, self-service cafeteria (aka automat) in New York City’s Times Square. You could pop in a couple of coins and pull out your own hot dish from behind a window. It was a technical marvel at the time.

horn hardartFast-forward to today and consumers still love that self-service experience. The March 2015 issue of Harvard Business Review reported that the fast food industry is introducing more self-service kiosks. Meanwhile, self-service has disrupted other industries ranging travel to finance to retail. It is becoming a self-service economy. And while the automat is a thing of the past, the trend toward automation continues; next thing you know, drones may be delivering your self-service order.

It seems that fast service is even better with self-service. Bypassing the order-takers tends to increase consumption, increase sales and increase market share.  Providing both fast consumption and self-service control can be a powerful combination.

This same premise holds true in enterprise IT. Self-service is becoming more prevalent in the IT industry – whether in the self-service consumption of cloud computing services or the increasing trend towards self-service in business intelligence and big data analytics.

However, providing self-service to users is an area where enterprise IT has sometimes stumbled. We’ve seen this in the Infrastructure-as-a-Service market, where Gartner reports that many private cloud initiatives have resulted in highly virtualized and automated infrastructure – but without the self-service and rapid elasticity that users want. This ‘as-a-service’ model has been challenging for many IT organizations.

To stick with the food service metaphor …. the IT teams in a typical large enterprise are great chefs and their kitchens (aka data centers) have been modernized, but users want a simple easy-to-use menu with fast service (not a list of the ingredients and a long wait for food preparation). And in many cases, they’ve been lured by the combination of self-service and fast service offered by public cloud services such as Amazon Web Services.

fast food self-service kioskThis is where the intersection of self-service for infrastructure and big data analytics becomes interesting. Users clearly want to do their own big data analysis, and the tools for self-service BI and data discovery are increasingly available. But how does IT standardize and deploy those applications for big data in a way that’s repeatable and easily consumable by the end user? How does IT balance the tension between ease-of-consumption and governance requirements? Can IT deliver on the ‘as-a-service’ model for on-premises big data and Hadoop infrastructure?

The problem is that most IT teams are made up of master chefs, not fast food operators. They can take a variety of different ingredients – storage, compute, networks, operating systems, data, etc. – and design enterprise-grade applications to meet the requirements of the business. But standardizing these applications to the point that they can be placed in a catalog and served up quickly via self-service is a daunting challenge, both technically and operationally.

This challenge is even more difficult for big data analytics and Hadoop applications. Hadoop is typically deployed on bare-metal infrastructure, requiring the provisioning of physical servers with direct attached storage. As a result, the agility and efficiency of virtualization hasn’t been an option for building Hadoop clusters until recently. This means that data scientists and data analysts often need to wait for the hardware to be procured before they can even begin their analysis.

Without self-service Hadoop

And the Hadoop software stack is complex. This makes it difficult for IT to stitch together all the various components – including the masters and workers, application masters and resource managers, name nodes and data nodes, job trackers and task trackers, etc. If enterprise IT has had a difficult time delivering on the promise of vanilla Infrastructure-as-a-Service, how can they deliver the various flavors and complex recipes of Hadoop-as-a-Service? For big data initiatives, how can IT departments avoid the stumbling blocks of self-service that has prevented many private cloud efforts from succeeding?

This is one of the fundamental challenges that we’ve set our sights on here at BlueData. The BlueData software platform was designed to simplify and streamline the deployment of big data infrastructure – bringing the benefits of virtualization to Hadoop. But we don’t stop at the infrastructure. Our software allows IT organizations to make Hadoop distributions, Spark, and big data analytical applications available in a self-service model – running on virtualized infrastructure.

Self-Service Hadoop,jpeg

With BlueData, data professionals can order from an online menu of Hadoop distributions (e.g. multiple versions of Cloudera and Hortonworks), Hadoop services (e.g. HBase, Hive, Impala, Pig), and applications (e.g. analytics, business intelligence, data preparation, and search) that can be delivered on-demand with their virtual Hadoop clusters. Data analysts and data scientists no longer need to wait for IT to provision the hardware and stitch the various components together.

One of my colleagues presented a session recently at Hadoop Summit in San Jose that highlighted the potential benefits of self-service Hadoop. You can view the slides from his presentation here:

It’s no surprise that data scientists and data analysts are hungry for the simplicity, control, and agility that comes with a self-service “kiosk” for their big data analysis. That’s the appeal of public cloud offerings for Hadoop-as-a-Service, such as Amazon’s Elastic MapReduce. With BlueData, IT can now deliver this same self-service model in their own data centers; enabling Hadoop-as-a-Service with on-premises infrastructure.

And one of the reasons it’s compelling is that it’s more than just the infrastructure – it’s all about delivering the big data applications and tools that the end user needs to do their own analysis. Consider the self-service options you use as a consumer for travel, banking, or other services … they provide specific fit-for-purpose capability and content for specific use cases that meet your needs.  It’s the application relevance that helps drive self-service consumption and delivers greater value.

Now, with BlueData, IT can deliver on the promise of “-as-a-service” for Hadoop – providing fast, self-service access to the big data applications and tools that business users need.

– by Jim Lenox, vice president of sales at BlueData