“You can have data without information, but you cannot have information without data.”
– Daniel Keys Moran
Business Intelligence (BI) tools are used to derive valuable information using historical and current data. These BI tools enable visualization of data that can help drive important decisions in a timely manner, so they are essential for business analysts that work with large data sets – and most of these end users have their favorite toolsets (including Tableau, Qlik, MicroStrategy, Microsoft Excel, and more).
But before these BI tools can be used to make sense of the data, a lot has to happen behind the scenes. Even with the best BI visualization tool, the business insights and effectiveness of the data analysis are only as good as the data framework that supports it. There is a pipeline of activities that needs to work in tandem for effective data analysis. Some of the crucial activities in this pipeline include:
- Acquiring data from a variety of sources;
- Processing and storing the data;
- Building longitudinal relationships between various data entities; and
- Analyzing and publishing the results.
Hadoop has emerged as a versatile and cost-effective option for data analysis (in some cases, as an extension to existing enterprise data warehouses) that can meet the demands of existing as well as emerging Big Data workloads. But as Hadoop begins to gain mainstream adoption, it will need to meet end users’ expectations of speed and accuracy – including compatibility with their BI tools of choice. For example, pre-defined cubes and indexes are needed to provide fast and meaningful access to various measures and dimensions of data.
To provide this intelligence layer for Big Data analysis, there are now some interesting technology mashups including memory-based cubes, columnar databases, and more. Some of these techniques require data to be copied, transformed and loaded into some type of structured format in memory or on disks. The downside of this pre-processing is that it limits visibility into raw data and lacks a real-time view of the data from an end users’ perspective.
Is it possible to build a fusion of existing BI tools on Hadoop, with speed and up-to-date information, in a seamless and cost-effective manner for end users? That’s precisely what companies like AtScale are doing. AtScale fits into the Big Data environment seamlessly, using technologies like Hive, Impala, Tez, and Spark SQL to build a real-time, dynamic cube on data that resides in the Hadoop Distributed File System (HDFS).
We’re working with some enterprises that have deployed AtScale, and they’ve been impressed with the technology’s “no data movement” architecture. What this means is that as soon as the data lands in HDFS, it becomes immediately available to users with no prior extraction or transformation. It is purely a schema-on-read architecture, where the cube definition itself contains the transformations. Transformations are injected at runtime when the queries are executed. Cubes can be thought of as a semantic business layer on top of HDFS, contained within Hadoop. This seamless architecture makes the best use of both Hadoop and BI tools, without the need for additional client software and drivers.
So how can BlueData help with accelerating the adoption of Hadoop together with BI tools for faster data insights? Our infrastructure software platform leverages virtualization technology and patent-pending innovations to make it much easier, faster, and more cost-effective to get up and running with Hadoop and Big Data analytics on-premises. Here’s a brief overview of the BlueData + AtScale value proposition:
In my experience working with organizations deploying Hadoop, it is next to impossible to anticipate every detail for Big Data analysis before the project starts and still be successful. End users should be given the flexibility to start with what they have and what they know; and then make adjustments as they learn more about the data, the data sources, and the business requirements. Without this flexibility, these organizations are stuck in a “chicken and egg” loop and make little progress with their Big Data deployment.
With BlueData, enterprises now have this flexibility:
- End users can easily create a multi-node virtual Hadoop cluster in minutes with just a few mouse clicks
- Applications like AtScale can be deployed in the same virtualized Hadoop environment, allowing end users to use their existing BI tools for Big Data analysis
- IT can quickly deploy a multi-tenant Hadoop architecture, and end users can start their Big Data analysis without the need for extended infrastructure planning
- IT has the ability to control resources centrally, while enabling end users to run their data analysis in separate tenants – with isolation and security
- Hadoop and BI computations can run on many data sources simultaneously. IT teams can create a centralized, logical data lake – while controlling data access
- End users benefit from speed and accuracy in their analysis of large data volumes and richly correlated information from various new data sources
To learn more about the benefits of BlueData and AtScale, watch this brief demo video where I show how the joint solution works: