Back to Blog

Hadoop and Spark on Docker: Ten Things You Need to Know

For a while now, I’ve been struggling to understand why any enterprise would want to build their own solution for large-scale deployments of Big Data workloads like Hadoop and Spark on Docker containers. The arguments for “doing it yourself” (DIY) often play like a broken record: “If they <insert name of humongous tech giant here> […] Read More

QoS for Hadoop Using Docker Containers

There is a lot of focus and attention on Big Data analytics today – and as I wrote in a recent blog post, it’s all about the applications.  But there are (of course) many infrastructure considerations to make the analytics and applications work seamlessly for your data scientists, analysts, and other users. One vital issue […] Read More

Docker, and Spark, and Hadoop. Oh My.

When we founded BlueData in the fall of 2012, we built our Big Data infrastructure software platform around the best open source hypervisor technology then available. We knew that virtualization was a key enabling technology to simplify Hadoop and other Big Data deployments. There were plenty of naysayers prophesying that Hadoop could never run in […] Read More