Back to Blog

Big Data in 2015: A Year in Review

Is it just me, or did the year fly by? The tech industry generally moves at light speed and the Big Data market was certainly no exception. Before we open the champagne to ring in 2016, I wanted to take a few minutes to look back at some of the highlights and review some of the key learnings from 2015.

Big Data enterprise spend on track to meet analysts’ forecasts


Wikibon set the year off with its Big Data Market Forecast report, estimating spending in 2014 at $27.36B and expected to exceed $84 billion in 2026. There’s been tremendous enterprise interest in Big Data throughout 2015 across every industry – with both new “greenfield” deployments and expansions of existing Big Data implementations. Based on the growth in adoption that we’ve seen here at BlueData (and the accelerating rate of new customer adoption!), IDC’s 23.1 % CAGR prediction for 2015 may actually be somewhat conservative.

Venture capital spending for Big Data

CB Insights Funding & Deals for 2015Data from CB Insights indicates that the venture capital community invested approximately $6B into Big Data related technology companies in 2014. Through the first 3 quarters of 2015, it appears that the investment community invested approximately $4.5B. So the investment continues, but now it’s time to starting seeing a return on that investment as these companies grow.

BlueData was among the Big Data investments in 2015, with our latest round of financing led by Intel Capital along with existing investors Amplify Partners, Atlantic Bridge, and Ignition Partners as well as a new unnamed strategic investor. In addition to the investment from Intel Capital, BlueData was honored to announce a major new partnership with Intel.

The Hadoop market matures

In 2012, BlueData was founded to simplify and accelerate the deployment of on-premises Big Data infrastructure in the enterprise. At that time, Apache Hadoop was still relatively early in its evolution – primarily used by large internet companies, technology startups, or a relative minority of forward-thinking enterprises.

Hadoop Adoption CurveFast forward to 2015, and Hadoop adoption is increasing throughout the enterprise. As Gartner pointed out in a survey published earlier this year, early adoption is giving way to the “early mainstream”. And with that maturity there are growing pains. I wrote about some of this in a blog post several months ago: Hadoop is still too complex and difficult to implement, and the lack of Hadoop experts (the “skills gap”) is a significant inhibitor to broader adoption. My friend Shaun Connolly at Hortonworks also wrote a good post about the adoption curve – including the illustration shown here.

Spark catches on fire

Spark was one of the hottest topics in Big Data throughout 2015; there are now more than 800 developers from 200 companies that have contributed to the open source Apache Spark project. As I wrote in a blog post earlier this year, Spark is where the puck is headed.

Here at BlueData, we made the decision to integrate and support Spark since early 2014; we were also Spark-certified in 2014. This year, our new software releases have continued to extend our support for Spark. And in our webinar this fall with Forrester, we highlighted the fact that Spark has arrived in the enterprise; we’re seeing Spark-related projects accelerate. As just one example, Accenture wrote a blog post about their experience with deploying BlueData’s software – including a use case with Spark Streaming.

It’s still relatively early in the Spark adoption cycle. But we anticipate that 2016 will be a turning point as more and more enterprises move from consideration, experimentation and evaluation of Spark to production deployments.

Docker is becoming pervasive

This year, BlueData introduced the 2.0 version of our EPIC software platform. EPIC 2.0 was a big hit – and it was just recently recognized by CRN as one of the 10 Coolest Big Data Products of 2015. It was truly an epic software release for many reasons – perhaps most notably because it uses Docker containers for faster, easier, and more cost-effective deployment of Hadoop and Spark.

Hadoop and Spark on Docker containersWhy did we choose Docker containers? We want to provide our customers with the lightweight flexibility of containers, while also ensuring high I/O performance and delivering better server utilization for their Big Data workloads. Now the time to spin up an on-premises Hadoop or Spark cluster is even faster than it was in our EPIC 1.0 version. And the packaging and deployment of various Big Data applications (e.g. Hadoop distributions, business intelligence and analytics tools, or custom Big Data apps) is now extremely simple, straightforward, and streamlined. That application deployment simplicity is why Docker containers became popular in the first place; now we’re extending that simplicity to Hadoop, Spark, and other Big Data workloads.

My colleague, Tom Phelan (co-founder of BlueData), wrote one of our most popular blog posts of the year about this topic here: “Docker, and Spark, and Hadoop. Oh My“.

Separation of compute and storage for Big Data

One of the key principles of BlueData’s software platform is the separation of compute and storage. While this is not a new concept, it is a big shift for Big Data – and Hadoop in particular. Since its inception, industry influencers have promoted data locality (with local direct-attached storage) as the only way to deploy Hadoop. And while most data centers are virtualized today, Hadoop remains a throwback to the age old one-box / one-workload paradigm: bare-metal deployments are the norm.

2015 was the year that BlueData severed the link between storage and compute for Hadoop. Together with our partners and customers, we’ve demonstrated that it is possible to run Hadoop against any shared storage system. We’ve also shown that Hadoop infrastructure can be virtualized while achieving comparable I/O performance to that of bare-metal.  It’s a fundamentally new approach – and it’s simpler, faster, easier, and less expensive than the traditional way to deploy Hadoop.

“App Store” for one-click deployment of Big Data applications

BlueData is an infrastructure software company, but we had a big focus on applications this year. Pre-integrating distributions for Hadoop and Spark was a logical first step for us; we established partnerships in 2014 with Cloudera, Hortonworks, and MapR as well as Databricks to build that foundation. But we knew that to truly provide our customers with the ability to gain business insights from their data, we needed to do more than accelerate the deployment of their infrastructure and data platforms; we also needed to deliver the last mile by enabling turnkey Big Data applications in a self-service, on-demand model.

BlueData 2015 PartnersIn 2015, we started by introducing integration with Cloudera Manager and Hortonworks Ambari; but that was just the beginning. In our EPIC 2.0 release, BlueData introduced the concept of an “App Store” for Big Data, where our customers could simply point-and-click to begin running their favorite analytics / business intelligence / visualization applications pre-integrated with the same distributions that we allowed them to spin up in minutes. Throughout 2015, we’ve added more applications to the App Store – ranging from AtScale to Platfora to Splunk and more.

In 2016, we will continue to partner with more Big Data application vendors and integrate with even more of these applications. Our goal is to enable a self-service, one-click deployment model for all Big Data applications and infrastructure.

More Big Data insights, less infrastructure complexity

buttonNot everything in 2015 was all rainbows and unicorns. As I mentioned above, Big Data infrastructure still remains complex, costly and difficult to deploy. Data in the enterprise has increased exponentially, but implementing the infrastructure and systems required for Big Data analytics is a major obstacle for most enterprises. BlueData’s value proposition – spinning up Hadoop or Spark clusters in minutes, providing a cloud-like experience in a multi-tenant on-premises environment, and enabling far lower TCO than bare-metal – really took hold this year. And our customers are seeing the benefits of simplicity, speed, and cost savings.

In 2016, we plan to continue to knock down the cost and complexity barriers to Big Data adoption in the enterprise. The focus should be on delivering new business insights with Big Data analytics; the underlying infrastructure needs to be easy, seamless, and straightforward to implement.  Watch for an upcoming webinar with Intel in January that will further explore the concept of an “Easy Button” for Big Data.

Big Data partnerships: stay tuned in 2016

2016 Partnerships to come2015 also marked a year of new strategic partnerships for BlueData. After establishing a “Business Collaboration Agreement” with Intel this past fall, we built foundational relationships with Dell around server hardware, EMC for storage, and Splunk for operational intelligence. Every successful company requires a healthy ecosystem of “anchor” partners like these. Together with our partners, we’ve spent considerable time testing and tuning the BlueData EPIC software platform with EMC Isilon, Dell hardware, and Splunk. The Splunk promotion kicked off just before the holidays, and we’ll be sharing some news about product certification and joint “go-to-market” efforts with each of our new partners in 2016.

It was a fantastic year. We saw the Big Data industry hit its stride with exciting new technology developments and continued adoption in the enterprise. Most importantly, we at BlueData have had the privilege to meet with hundreds of potential new customers and partners, working alongside some of the most talented engineers and innovators in the industry. We’re looking forward to more in 2016 – cheers!

2015-16 Picture