We’re starting a new year and so as technology professionals it’s time to take stock of our existing skill sets and prepare for the future by identifying those new skills that are likely to be most valuable.
A recent ComputerWorld survey highlighted the “10 hottest tech skills for 2016” and Big Data came in at number 4 (up from number 10 last year). Just behind that was Business Intelligence and Analytics at number 5 (up from number 7). And Big Data/Analytics was number 1 on the list of technologies that survey respondents said they were currently beta-testing or using in pilot projects.
So there’s no question that skills in Big Data and analytics are hot – and Big Data technologies like Hadoop and Spark are fueling that fire. For a long while now I’ve heard the clarion call of “you need to learn Hadoop” as the gateway to Big Data success. But to be honest, this year that note rings flat. Heresy you say for someone in the Big Data world to knock Hadoop? But please bear with me for a moment.
Many industry experts and pundits have highlighted a shortage in Hadoop skills; in a recent Gartner survey, 57% said the skills gap was a primary inhibitor to Hadoop adoption. And that skills gap is likely to continue in 2016. However, the focus is shifting.
I’ve had numerous discussions over the last few years with many enterprise organizations, both large and small, concerning their Big Data needs. Three years ago, those discussions focused on Hadoop cluster configuration. Essentially, they were focused on how to optimize HDFS layouts or how to tune YARN NodeManagers and ResourceManagers. Deep knowledge of Hadoop internals was required. Two years ago, those discussions began to turn to infrastructure debates such as how well Hadoop performed on bare-metal vs. virtual machines vs. container environments. We’ve found that experience in running and tuning virtual infrastructure for Big Data deployments is essential.
Last year, however, the discussions moved from Hadoop infrastructure and systems debates to questions about which business intelligence and analytics tools were the best for Hadoop and for which use cases. The conversations with these same organizations have shifted to how best to utilize open source tools such as HBase, Hive, Solr, or Spark Streaming as well as proprietary applications such as AtScale, Platfora, and Tableau for their Big Data analytics.
Today, data scientists and analysts assume that there will be reliable and high-performance infrastructure available to provide the timely processing of Big Data analytics. Those are the table stakes. If you don’t have that, or if the infrastructure is not flexible and easily extensible, your data science teams can’t do their jobs. All data sources, both legacy and future, are expected to be consumable. How the underlying compute services are implemented (whether on bare-metal or on containers) is irrelevant to them. As long as it works and IT doesn’t slow them down.
So in my recent discussions with enterprise organizations, it’s no longer just about how to spin up a Hadoop cluster. Now they ask how to deploy the latest version of Platfora, or how to run a version of Hive built from source code customized to their needs, or how to run three different versions of Spark on three different clusters simultaneously. Oh, one more thing: this needs to be cookie-cutter simple.
The Hadoop cluster has become a commodity. Let me put this another way: compare a Hadoop cluster to your phone. In the beginning, getting a phone to work at all was considered a technology marvel. Now phones are ubiquitous and mobile. They come in all shapes and sizes and colors. No one really cares about the inner workings of the phone.
Today, they care about the apps that run on the phone; the App Store is what drove the phenomenal success of the iPhone. Data scientists are the application developers of the Big Data world; what really matters are the applications that they use and develop. And that’s the future of Big Data: it’s all about the apps. That’s why we recently introduced our own App Store concept for Big Data (and just one of the reasons why our software was recognized as one of the coolest Big Data products of 2015).
As we embark upon this new year, the focus on Big Data skills will be in the applications and analytics arena – not the underlying Hadoop infrastructure. Along those same lines, our mission here at BlueData is to make Big Data infrastructure simple, seamless, and invisible – so that your team can focus on using the apps they need, without having to learn Hadoop’s inner workings or how to deploy those apps.