Back to Blog

Deep Learning with BigDL and Apache Spark on Docker

The field of machine learning – and deep learning in particular – has made significant progress recently and use cases for deep learning are becoming more common in the enterprise. We’ve seen more of our customers adopt machine learning and deep learning frameworks for use cases like natural language processing with free-text data analysis, image […] Read More

Hadoop and Spark on Docker: Ten Things You Need to Know

For a while now, I’ve been struggling to understand why any enterprise would want to build their own solution for large-scale deployments of Big Data workloads like Hadoop and Spark on Docker containers. The arguments for “doing it yourself” (DIY) often play like a broken record: “If they <insert name of humongous tech giant here> […] Read More

Distributed Data Science with Spark 2.0, Python, R, and H2O on Docker

Here at BlueData, I’ve worked with many of our customers (including large enterprises in financial services, telecommunications, and healthcare, as well as government agencies and universities) to help their data science teams with their Big Data initiatives. In this blog post, I want to share some of my recent experiences in working with the data […] Read More

Apache Spark Integrated with Jupyter and Spark Job Server

Apache Spark is clearly one of the most popular compute frameworks in use by data scientists today. For the past couple years here at BlueData, we’ve been focused on providing our customers with a platform to simplify the consumption, operation, and infrastructure for their on-premises Spark deployments – with ready-to-run, instant Spark clusters. In previous […] Read More

Real-Time Data Pipelines with Spark, Kafka, and Cassandra (on Docker)

In my experience as a Big Data architect and data scientist, I’ve worked with several different companies to build their data platforms. Over the past year, I’ve seen a significant increase in focus on real-time data and real-time insights. It’s clear that real-time analytics provide the opportunity to make faster (and better) decisions and gain […] Read More

Hadoop Virtualization: It’s About Time (and Value)

“Whoever wishes to foresee the future must consult the past.” – Machiavelli While there have been many amazing technologies that have impacted the data center, history books will certainly keep a warm spot open for “Virtualization”.  Virtualization is certainly not a new idea (indeed, it dates all the way back to the 1960s) but it […] Read More

A Quick Start Guide for Deploying Apache Spark with BlueData EPIC 2.0

Apache Spark has quickly become one of most popular Big Data technologies on the planet. By now, you probably know that it offers a unified, in-memory compute engine that works with distributed data platform such as HDFS. So what does that mean? It means that in a single program, you can acquire data, build a pipeline, and […] Read More

Docker, and Spark, and Hadoop. Oh My.

When we founded BlueData in the fall of 2012, we built our Big Data infrastructure software platform around the best open source hypervisor technology then available. We knew that virtualization was a key enabling technology to simplify Hadoop and other Big Data deployments. There were plenty of naysayers prophesying that Hadoop could never run in […] Read More

Where the Puck is Going: Apache Spark and Big Data Analytics

Big Data analysis is having an impact on every industry.  This is no longer a tactic taken by a few visionary leaders to capitalize on new business insights.  It’s quickly moving into the mainstream. The early adopters of Big Data gained a competition advantage.  Today, it’s table stakes: Big Data is now a competitive imperative.  If you aren’t […] Read More