Back to News

BlueData Offers New Turnkey Solution for Fast Data with Spark, Kafka, and Cassandra

Build Real-Time Data Pipelines with the BlueData EPIC Software Platform and Docker Containers

SANTA CLARA, Calif.BlueData, provider of the leading infrastructure software platform for Big Data, today announced a new solution for building real-time data pipelines with Spark Streaming, Kafka and Cassandra. This new turnkey offering is designed for organizations that want to develop and test applications for analyzing “Fast Data”: real-time or near real-time data that requires instant awareness, faster decision-making, and immediate action.

Fast Data use cases are emerging in almost every industry: ranging from fraud detection for financial transactions; to Internet of Things (IoT) monitoring with sensor-generated data; to campaign optimization and real-time bidding in advertising technology.  Real-time analysis of these new high-velocity data streams (from financial markets, sensor data, machine logs, social media, mobile applications, and other sources) can bring tremendous value – whether in delivering competitive business advantage, averting potential crises, or creating new revenue opportunities. But this data is perishable, and may lose its operational value in a very short time frame. Speed is of the essence.

For data scientists and developers working with real-time pipelines, the stack of Spark-Kafka-Cassandra has quickly emerged as the best place to start.  This new trinity of open source systems delivers on key requirements for Fast Data:

  • Spark: a fast in-memory data processing engine, and the fastest growing Apache open source technology. Spark Streaming is an extension of the core Spark API; it allows integration of real-time data from disparate event streams.
  • Kafka: a messaging system to capture and publish streams of data. With Spark you can ingest data from Kafka, filter that stream down to a smaller data set, augment the data, and then push that refined data set to a persistent data store.
  • Cassandra: this data needs to be written to a scalable and resilient operational database like Cassandra for persistence, easy application development, and real-time analytics.

However, the infrastructure for the Spark-Kafka-Cassandra stack is time-consuming to assemble and most organizations lack the skills to deploy and configure each of the necessary components. BlueData’s mission is to make this infrastructure deployment easy.  The BlueData EPIC software platform is purpose-built to simplify and accelerate the infrastructure deployment for Hadoop, Spark, and related tools for Big Data (and Fast Data) analytics – leveraging patent-pending innovations and Docker container technology.

The new Spark-Kafka-Cassandra solution provides a full enterprise license of BlueData EPIC software along with the professional services needed to deploy an on-premises lab environment for building real-time data pipelines. With BlueData, customers will have a multi-tenant sandbox for prototyping, developing, and testing new Fast Data applications and use cases with this popular stack (either with or without Hadoop).

This new turnkey solution includes the following:

  • Accelerated deployment for real-time data pipelines, with BlueData EPIC software and Docker containers.
  • A ready-to-run Spark-Kafka-Cassandra lab for rapid prototyping, development, testing, and quality assurance.
  • Two sample end-to-end data pipelines integrated with Spark Streaming, Kafka, and Cassandra as a starting point.
  • Sample datasets and sample use cases for real-time streaming, with assistance from BlueData experts to help customers get started.
  • Rapid prototyping and agile application development with the ability to spin up new clusters in a matter of minutes via self-service, with just a few mouse clicks.
  • Improved developer productivity with web-based Zeppelin notebooks that can be shared with other users in a multi-tenant environment on shared infrastructure.

“Batch processing of large datasets was the start for many Big Data analytics initiatives. But now there’s growing demand from organizations analyzing real-time ‘data in motion’ in addition to the more traditional batch-oriented ‘data at rest’ use cases,” said Kumar Sreekanti, CEO of BlueData. “For real-time data pipelines, we’ve seen Spark Streaming together with Kafka and Cassandra emerge as a popular stack. BlueData makes it easy for enterprises to get started quickly with these new tools and technologies in a turnkey on-premises lab environment.”

The Real-Time Pipeline Accelerator includes a one-year subscription for BlueData EPIC software along with professional services to assist in building real-time data pipelines.  BlueData will be demonstrating this new solution at the Spark Summit East event in New York, February 16th to 18th, in booth K12.

Supporting Resources
Solution brief: Real-Time Pipeline Accelerator
Blog post: Real-Time Data Pipelines with Spark, Kafka, and Cassandra (on Docker)

About BlueData Software, Inc.
BlueData is transforming how enterprises deploy their Big Data applications and infrastructure. The BlueData EPIC™ software platform uses container technology to make it easier, faster, and more cost-effective for enterprises of all sizes to leverage Big Data – enabling Big-Data-as-a-Service in an on-premises deployment model. With BlueData, they can spin up virtual Hadoop or Spark clusters within minutes, providing data scientists with on-demand access to the applications, data and infrastructure they need. Based in Santa Clara, California, BlueData was founded by VMware veterans and its investors including Amplify Partners, Atlantic Bridge, Ignition Partners, and Intel Capital. To learn more about BlueData, visit or follow @bluedata.

Press Contact
Jordan Tewell