This is a guest blog courtesy of Keith Manthey, CTO of Analytics at Dell EMC. The content originally appeared on the Dell EMC blog site here.
As a part of my regular duties, my job is to pay attention to macro-level movements of various industries and technology sectors. One of those sectors that is facing some rather large tectonic shifts as of late is the emerging and rapidly growing sector often referred to as big data. More specifically, the topic is Hadoop.
Hadoop is much decried as being too hard to implement, and many complain about the lack of talent and expertise in Hadoop. Much of this is overblown, but it is undoubtedly true that running the network and compute on 10,000 nodes of Hadoop is FAR more difficult than running a lab of 10 Hadoop nodes with 3 master nodes. The level of complexity past 1,000 Hadoop nodes is a logarithmical curve. It is also true that hiring polymath talent to do all things Hadoop is very competitive.
Lately, there have been some very interesting polls and studies around organizational interests in Hadoop, as well as benchmark studies that line up with that interest. IDC released a customer survey last year that made it into my hands a few months ago (Source: IDC, Hadoop Adoption Rationale and Expectations, September 2016). Based on responses from 219 private and public sector organizations in the U.S., the results in this IDC poll showed a very interesting quandary.
The most popular architecture for Hadoop was centralized enterprise storage (selected by more than 35 percent of survey respondents who indicated they were considering or had already deployed Hadoop). However, performance was the number one primary driver for selecting a Hadoop architecture (indicated by more than 50 percent of respondents).
Based on this data, it’s clear that enterprises want enterprise storage for Hadoop and they are also very concerned about performance. This is a contradiction to the traditional Hadoop reference architecture from just a few years ago (i.e. direct-attached storage).
Indeed, now when I talk to our customers about their hopes for Hadoop, they talk about the need for enterprise features, ease of management, and Quality of Service. These are the signs of Hadoop moving out of its infancy and awkward teenage years, and becoming part of a more mature enterprise technology sector.
Intel also recently released a performance benchmark study that showed no performance slowdown for Hadoop when run in Docker containers. This too is fascinating.
The chart above shows the overall performance of containerized Hadoop running on the BlueData software platform compared to Hadoop on bare-metal for 10, 20, and 50 node clusters. In this case, higher is better.
The benchmark study shows that performance for Hadoop on the container-based software platform from one of Dell EMC’s partners, BlueData, now rivals the performance of a bare-metal Hadoop instance. The BlueData platform also provides enterprise security and multi-tenancy for large-scale containerized Hadoop deployments. Again, the traditional reference architecture for Hadoop has historically been all about bare-metal clusters; containerized Hadoop was perceived as potentially slower, less secure, and/or not scalable. The study’s findings clearly fly in the face of “conventional wisdom” for Hadoop.
To learn more, you can also view the on-demand replay of our recent joint webinar with Dell EMC at the link below: