Back to Blog

Big Data Security

Computer data security is on everyone’s mind. Concern over the protection of data has gone far beyond the realm of white hats and security experts. It is now firmly in the sights of the VC firms [Whats Big in Venture Capital] and even in the mainstream media[Sony Pictures Hack]. And Big Data is no exception. In fact, since there is so much Big Data and the value within that data is of such enormous value, enterprises face a daunting task when it comes to controlling access and insuring that only authorized users see it.

In the past, computer data security was a specialty. Each storage vendor had different mechanisms for protecting the data stored in their systems. These mechanisms included data encryption, user authentication, user authorization, and access auditing. Over time, the data encryption algorithms standardized, but there remained differences in the authentication and authorization implementations. This was not an undue burden as typically there was a single application that accessed a given pool of data and it was designed to use the specific authentication and authorization mechanism for that data pool.

Enter Big Data and the Data Lake Now there are many types of data, each with different levels of security and audit requirements, which must be accessed by an array of applications such as Spark, Hive, Impala, MapReduce, etc. These various types of data may all reside in a single enterprise data hub or remain in individual storage silos. In either case, the Big Data applications need to access multiple types of data simultaneously in order to do their analysis. Yet at the same time the user of the application must meet the proper authorization requirements for each type of data being accessed.

While to some this seems like re-inventing the wheel, much work has been done on the security of Hadoop when running with a single HDFS.However, those enterprises that want to remain with storage silos and still run Big Data applications have had to modify those applications. Until now. With BlueData’s EPIC platform and its DataTAP technology, the data scientist will be able to run unmodified Big Data applications and securely access data from multiple storage silos using the authentication and authorization mechanism native to each silo. This will dramatically reduce the burden on the data scientist without compromising the security of the data. A big win for everyone.