Back to Blog

Deep Learning with BigDL and Apache Spark on Docker

The field of machine learning – and deep learning in particular – has made significant progress recently and use cases for deep learning are becoming more common in the enterprise. We’ve seen more of our customers adopt machine learning and deep learning frameworks for use cases like natural language processing with free-text data analysis, image […] Read More

Distributed Data Science with Spark 2.0, Python, R, and H2O on Docker

Here at BlueData, I’ve worked with many of our customers (including large enterprises in financial services, telecommunications, and healthcare, as well as government agencies and universities) to help their data science teams with their Big Data initiatives. In this blog post, I want to share some of my recent experiences in working with the data […] Read More

Apache Spark Integrated with Jupyter and Spark Job Server

Apache Spark is clearly one of the most popular compute frameworks in use by data scientists today. For the past couple years here at BlueData, we’ve been focused on providing our customers with a platform to simplify the consumption, operation, and infrastructure for their on-premises Spark deployments – with ready-to-run, instant Spark clusters. In previous […] Read More

A Quick Start Guide for Deploying Apache Spark with BlueData EPIC 2.0

Apache Spark has quickly become one of most popular Big Data technologies on the planet. By now, you probably know that it offers a unified, in-memory compute engine that works with distributed data platform such as HDFS. So what does that mean? It means that in a single program, you can acquire data, build a pipeline, and […] Read More

Where the Puck is Going: Apache Spark and Big Data Analytics

Big Data analysis is having an impact on every industry.  This is no longer a tactic taken by a few visionary leaders to capitalize on new business insights.  It’s quickly moving into the mainstream. The early adopters of Big Data gained a competition advantage.  Today, it’s table stakes: Big Data is now a competitive imperative.  If you aren’t […] Read More

Apache Spark Infrastructure Made Easy

If you’re following the Big Data space, you’ve most likely heard at least something about Apache Spark. Originally developed in the AMPLab at UC Berkeley by Ion Stoica and Matei Zaharia, Spark is an open-source in-memory cluster computing engine for large-scale data processing. By keeping data in memory, Spark allows users to quickly perform repeated […] Read More