I would like to welcome you to the wild world of Big Data. Before you proceed, please have a full grasp of:
Ambari, Accumulo, Avro, BigInsights, BigSQL, Calcite, Cassandra, CDP, Cloudera, Couchbase, Drill, Falcon, Flink, Flume, Geode, Hadoop, HBase, HDInsight, HDFS, HDP, Hive, Hortonworks, Hue, Ignite, Impala, Kafka, Knox, Mahout, MapReduce, MapR, MemSQL, MongoDB, NoSQL, ODP, Oozie, Parquet, Phoenix, Pig, Ranger, Sentry, Slider, Spark, Sqoop, Storm, Tachyon, Tez, YARN, and everyone’s favorite … ZooKeeper.
In addition, please understand:
Big Data analytics are now seen as a must-have and mission-critical – and yet the complexity is overwhelming. Hadoop hasn’t yet fully hit its maturity – but Spark is the next big thing. Big Data spend for 2013 was $13B, rose to $28B in 2014 and is predicted to hit $50B by 2017 – but adoption is slower than some expected. Lastly, IoT (that’s the Internet of Things) will create 44 zettabytes of data by 2020 – but only 30% will be useful for analysis.
While this blog introduction was intended to be tongue-in-cheek, the fact of the matter is that for most of you, it triggered a level of anxiety that reflects the unfortunate reality of your current Big Data environment.
Big Data is exceedingly complex. At the end of the day, we simply want to gain insights into data that will allow us to compete, gain a competitive advantage, defend market share, acquire new customers, solve business problems, or simply provide better customer service.
So why is this so difficult?
After you finish reading this blog, I would recommend you read the latest survey analysis from Gartner on this topic. Analysts Merv Adrian and Nick Heudecker present a very thoughtful and pragmatic perspective on Hadoop adoption. It’s consistent with the challenges that we’ve seen with enterprise customers in every industry. And it can be crystalized down to six distinct areas:
Expertise: Finding and hiring Big Data expertise is difficult and expensive. In the Gartner survey, this “skills gap” was cited by 57% of respondents as an inhibitor to Hadoop adoption. There are simply too many obscure technologies and too little available talent. Many enterprise IT departments have resorted to re-purposing existing talent and throwing them at Big Data projects.
Deployment: The infrastructure and systems deployment for Big Data is way too complex, takes too much time, and requires too much investment. This complexity and cost makes it difficult for enterprises to expand from their initial pilot projects into broader production deployments.
Budgets: Big Data analysis doesn’t escape corporate ROI governance. Wikibon research has shown that Big Data is delivering an ROI of about 55 cents on the dollar – which isn’t great.
Security: Copying data and dumping it into a central repository represents a significant risk for many enterprise IT organizations.
Data Sources: Integrating and analyzing data from multiple sources is a major challenge.
Existing Infrastructure: Interoperability with existing infrastructure and systems is a barrier to production-grade enterprise deployments.
So has the time come to simplify this quagmire? Is it time for the Big Data “easy button”?
The short answer is yes. Let’s look at each of these areas and what’s required:
Expertise and Deployment: Here at BlueData, our mission is to simplify the complexity of Big Data and streamline Big Data infrastructure deployment. Enterprises need an “easy button” that significantly reduces the need for specialized Big Data expertise.
That’s why we developed the BlueData EPIC software platform, with built-in integration for the major Hadoop distributions and Spark as well as the leading Big Data analytical applications. Users can simply select the environment they want, with just a few mouse clicks in a web-based user interface.
Budgets: The ROI for Big Data can’t languish in the 50% range. Enterprises need to spend more of their budgets on innovation and new business opportunities resulting from data insights – and less on hiring new Big Data specialists, deploying new infrastructure, or moving existing data.
With BlueData, IT organizations can save up to 75% on the infrastructure and operational costs for Big Data deployments by improving hardware utilization, reducing cluster sprawl, and eliminating data duplication. That helps put the ROI numbers where they should be.
Security: Data can’t be copied and dumped into a central repository that lacks oversight and governance; if so, your data lake will end up as a data swamp.
The BlueData solution allows you to keep data where it currently resides. Instead of duplicating or triplicating data and exposing the potential of leakage, our software helps eliminate this risk by separating compute and storage – allowing enterprise IT organizations to leverage the security of their existing enterprise-grade storage systems.
Data Sources: Integrating data from multiple sources shouldn’t cause further complexity. With the BlueData software platform, data can be analyzed from HDFS as well as other shared storage systems – including NFS, Swift API, Gluster and Ceph. Whatever the source, our software allows your organization to analyze the data instead of moving it.
Existing Infrastructure: Enterprise IT organizations have already invested a great deal on their business intelligence tools and related data center infrastructure and systems. They need to ensure that their Big Data environment can co-exist and interoperate with these existing investments.
Our software platform provides RESTful APIs that allow for easy integration. As a result, it can simplify the integration with any business intelligence and analytical applications, infrastructure management systems and operational tools, as well as server, storage, and networking infrastructure. And with our use of the latest virtualization technologies, we can help to maximize the utilization of existing infrastructure investments.
So is there a Big Data “easy button” that …
- Embeds the major Hadoop distributions and integrates with the Big Data analytical, business intelligence, and visualization software you need?
- Enables enterprises to implement Hadoop as well as Spark in a secure, multi-tenant virtualized environment while maintaining high levels of security?
- Allows your users to spin up virtual Big Data clusters in minutes, instead of waiting weeks or months?
- Provides the flexibility and agility of Hadoop-as-a-Service, while keeping your data on-premises and leveraging existing infrastructure investments?
- Simplifies the complexity, reduces the cost, and alleviates the skills gap – allowing enterprises of all sizes to realize the promise that Big Data delivers?
The short answer is yes. And it’s EPIC.