Back to Blog

Distributed Machine Learning Environments with H2O on Containers

More and more enterprises are adopting machine learning in support of their AI and digital transformation initiatives. Here at BlueData, I’ve worked with many customers across multiple industries to implement their machine learning algorithms. Whether in financial services, insurance, life sciences, healthcare, manufacturing, retail, telecommunications, or government … adoption is accelerating across every sector for […] Read More

Analytics and Machine Learning with SAS Viya on Docker Containers

As the industry leader in business analytics software, SAS brings a formidable toolset to address a wide range of use cases (including churn prediction, customer segmentation, market basket analysis, and more) – enabling enterprises to extract business value from large volumes of data. IDC research shows SAS with more than 30 percent of the market […] Read More

Deep Learning with TensorFlow, GPUs, and Docker Containers

I work with a lot of data science teams at our enterprise customers, and in the past several months I’ve seen an increased adoption of machine learning and deep learning frameworks for a wide range of applications. As with other use cases in Big Data analytics and data science, these data science teams want to […] Read More

Deep Learning with BigDL and Apache Spark on Docker

The field of machine learning – and deep learning in particular – has made significant progress recently and use cases for deep learning are becoming more common in the enterprise. We’ve seen more of our customers adopt machine learning and deep learning frameworks for use cases like natural language processing with free-text data analysis, image […] Read More

Large-Scale Data Science Operations

Here at BlueData, I get the opportunity to meet with many data science teams working on very interesting projects in different industries across our customer base. These are definitely exciting times to be working in the field of data science, machine learning, and analytics. The primary goal of a data science team is to understand […] Read More

Distributed Data Science with Spark 2.0, Python, R, and H2O on Docker

Here at BlueData, I’ve worked with many of our customers (including large enterprises in financial services, telecommunications, and healthcare, as well as government agencies and universities) to help their data science teams with their Big Data initiatives. In this blog post, I want to share some of my recent experiences in working with the data […] Read More

App Workbench – Managing the Menagerie of Big Data Apps

One of the most challenging aspects of Big Data deployments is keeping up with the dynamic nature of Big Data frameworks, distributions, applications, and their latest versions. The success or failure of a Big Data implementation may hinge on how well the organization handles support for the menagerie of applications and tools that data scientists, […] Read More

Apache Spark Integrated with Jupyter and Spark Job Server

Apache Spark is clearly one of the most popular compute frameworks in use by data scientists today. For the past couple years here at BlueData, we’ve been focused on providing our customers with a platform to simplify the consumption, operation, and infrastructure for their on-premises Spark deployments – with ready-to-run, instant Spark clusters. In previous […] Read More

Real-Time Data Pipelines with Spark, Kafka, and Cassandra (on Docker)

In my experience as a Big Data architect and data scientist, I’ve worked with several different companies to build their data platforms. Over the past year, I’ve seen a significant increase in focus on real-time data and real-time insights. It’s clear that real-time analytics provide the opportunity to make faster (and better) decisions and gain […] Read More

A Quick Start Guide for Deploying Apache Spark with BlueData EPIC 2.0

Apache Spark has quickly become one of most popular Big Data technologies on the planet. By now, you probably know that it offers a unified, in-memory compute engine that works with distributed data platform such as HDFS. So what does that mean? It means that in a single program, you can acquire data, build a pipeline, and […] Read More