Back to Blog

Data Lakes, Swamps and Solutions

A fierce debate has erupted recently between proponents of the all-encompassing data lake (or data hub or data pond, etc.) and those that favor enterprise-grade data management platforms. The arguments go something like this. The advocates for data lake hold to their argument that putting all data into a single repository allows for liberalization of data by making it accessible to any application that can navigate the lake. While those defending data warehouses contend that the current methods for management and security of data in the data lake, disparagingly referred to as a swamp, are not yet mature and so more traditional data management platforms should still be used.

The heart of the matter is that currently it takes an experienced data mariner to navigate wild and complex data lakes, while in truth most of us are still weekend sailors more comfortable with traditional data silos.

At BlueData we believe that the data liberalization movement will and should continue. We also believe that the data lake definition and architecture needs to evolve to eliminate the issues around large-scale data duplication and data governance.

The concept of the data lake is very powerful. You can store any data including structured data, sensor data from Internet of Things (IoT) and unstructured content like audio and video, and social media content. In addition to cost savings on storage and expensive database licenses, the data lake offers up agility and analytical experimentation that is simply not possible with traditional data warehousing.

At the same time, as with any emerging architecture, there remain some significant challenges with data lakes. As such, these challenges are not so much with the concept of the data lake but with the current implementation approaches using rigid and inflexible physical infrastructure that in turn introduces several issues around management complexity, security, access control, privacy and regulatory compliance.

Enter DataTap™. With BlueData’s DataTap technology, any application can access any data source, no matter where the data resides and which file system format the data is actually stored in. DataTap introduces a refreshing new way to reap all benefits of a data lake while addressing concerns and risks with the current approach. DataTap creates a logical data lake over enterprise storage systems, as well as salvaging any existing data lakes that are at risk of becoming data swamps. This means that enterprises can continue to use their existing enterprise storage, such as NFS, without moving their sensitive data – and more importantly, leverage the proven data governance and security of these storage systems.

And DataTap can do more than just tap the data lake. It can provide each line of business or even a specific user their own set of taps to gain access to all the data needed, while ensuring at rest and in motion encryption, user level access protection and auditing on top of any storage system. DataTap can also manage and enforce fine-grained quality of service (QoS) rules on any storage system so that users are not drinking through the fire hose.

So while the data technologies are maturing, enterprise users can completely rely on DataTap to provide the value of the data lake without the risks of turning it into a data swamp.