Back to Blog

Analytics and Machine Learning with SAS Viya on Docker Containers

As the industry leader in business analytics software, SAS brings a formidable toolset to address a wide range of use cases (including churn prediction, customer segmentation, market basket analysis, and more) – enabling enterprises to extract business value from large volumes of data.

IDC research shows SAS with more than 30 percent of the market share in advanced analytics, having led in the category every year over the past twenty years. In fact, SAS has been in the business of analytics and data science for almost 40 years – with tens of thousands of customers. And SAS has continued to evolve its software portfolio to augment the Big Data ecosystem, including new artificial intelligence (AI) applications and technologies such as machine learning.

SAS can be a powerful complement to Big Data tools like Hadoop, combining the business analytics power of SAS with distributed computing technologies to transform Big Data into business insights. The SAS team has also introduced new and modern deployment options for its customers, with SAS Viya: a next-generation, cloud-ready architecture for SAS users.

Both SAS Viya and the BlueData EPIC platform were recently recognized by DBTA as trend-setting products in data management and analysis for 2018. DBTA recognized BlueData for its innovations in using Docker containers to accelerate the deployment of Hadoop, Spark, Kafka, and other Big Data analytics, data science, and machine learning tools. Now BlueData is working with SAS and other partners like Dell EMC to deploy SAS Viya in a containerized environment – delivering a Big-Data-as-a-Service solution to our joint customers.

SAS Viya is made up of multiple components, most notably the SAS Cloud Analytic Services (CAS): CAS is the run-time engine for high-performance analytics and distributed computing in SAS Viya. CAS is Hadoop-compatible, and was designed to enable distributed (i.e. over many host computers) processing for in-memory analytics at scale. It can run either on public cloud infrastructure (e.g. AWS) or on on-premises infrastructure.

CAS can be deployed in one of two ways: SMP (Symmetric Multi-Processing) or MPP (Massive Parallel Processing). In SMP mode, there is just one node running CAS. When deployed in MPP mode, the CAS analytical processes are executed on multiple worker nodes simultaneously in order to maximize the benefit of parallel processing. It can dynamically scale up and down based on workload demands by adding or removing its worker nodes.

As depicted in the diagram below, CAS sessions are the link through which multiple users can connect to and interact with CAS for their analytical workloads in MPP mode.

In MPP mode, Hadoop is available if desired. With Hadoop, the controller runs the name node and the worker nodes run the analytics on in-memory data tables within the CAS engine.

The core architecture of SAS Viya and the CAS in-memory engine provides:

  • Elasticity: A distributed and scalable cloud-like environment, where nodes can be added and removed as needed.
  • Open / multi-language support: Includes REST APIs, with support for a wide range of coding languages and programming interfaces such as Python, R, Java, Lua, and SAS.
  • Multi-user / sharable: Security for multiple user-level sessions, with the ability for users to securely share data between sessions.
  • Fault tolerance: The failure of a node is survived without data loss and/or interruption to the execution of a user’s active requests.
  • Fault isolation: The session is the unit of isolation. A failure in one session does not have impact on other sessions.

Multiple SAS products can now take advantage of the new SAS Viya architecture, including SAS Visual Analytics, SAS Visual Statistics, SAS Visual Data Mining and Machine Learning, SAS Visual Investigator, and more. SAS customers can also execute SAS 9 code and models in SAS Viya or bring SAS Viya results into their SAS 9 deployments – to complement their existing SAS assets, skill sets and experience.

BlueData EPIC and SAS Viya

The design principles of the BlueData EPIC software platform align nicely with the fundamentals of SAS Viya. Using Docker containers, BlueData’s EPIC platform makes it easier, faster, and more cost-effective to deploy distributed Big Data analytics, data science, and machine learning applications. Customers can deploy BlueData EPIC either on-premises, in a public cloud service like AWS, or in a hybrid model – enabling the on-demand provisioning and elasticity of Big-Data-as-a-Service (BDaaS) regardless of the underlying infrastructure.

I’ve written blog posts and given presentations about deploying Hadoop, Spark, Kafka, data science noteooks like Jupyter and RStudio, and deep learning applications like TensorFlow in a containerized environment with BlueData. With its cloud-ready architecture, now SAS Viya can be easily deployed in a containerized environment using the BlueData EPIC platform.

With BlueData, our customers can automatically deploy the CAS run-time engine components on Docker containers – including the controller and worker nodes, and other administrative services – either co-located on Hadoop nodes or by themselves in MPP mode. All the associated SAS Viya products and end user services (including SAS Visual Analytics, SAS Visual Data Mining and Machine Learning, as well as Jupyter notebooks and other tools) can also be containerized and integrated with CAS and Hadoop on the BlueData platform.

The screenshot below shows a pre-configured Docker image for SAS Viya 3.3 in the BlueData EPIC App Store, available for one-click deployment.

Now SAS data analysts and data science teams can easily spin up multi-node clusters for SAS Viya within minutes – using BlueData EPIC’s self-service interface. As shown in the screenshot below, the user can create a new containerized cluster for SAS Viya in just a few mouse clicks:

This new cluster creation process automates all the setup required for a multi-node SAS Viya cluster, including all the associated services and tools required.

The BlueData EPIC screenshot below shows the template for an example SAS Viya deployment co-located with Hadoop: the containerized “SAS Mid Tier” cluster includes the administrative services such as SAS Studio (the integrated programming environment for SAS Viya) and SAS Environment Manager (for managing the SAS Viya environment); the Hadoop cluster (in this case, Cloudera “CDH 5.10 with Cloudera Manager”) has the co-located CAS controller and CAS worker nodes running on Docker containers.

Links to the relevant web interfaces for any of the associated SAS or Hadoop services can be made available in a secure, reliable manner for end users – based on security policies defined in BlueData EPIC. Users can also add SAS kernels as needed for running SAS programs from their preferred data science notebooks, such as Jupyter Notebook or JupyterHub. All applications running on BlueData are unmodified, and they work just as if they were deployed on dedicated bare-metal physical infrastructure – with the same security and high performance.

With BlueData, SAS customers can automate the deployment of a fully integrated and containerized SAS Viya stack with the CAS controller and workers running on Hadoop compute clusters, co-located with compute-only Hadoop nodes. From a storage perspective, they can leverage the DataTap functionality of BlueDate EPIC to access data in either local and/or remote HDFS, from a local file system or from shared storage infrastructure.

Some of the benefits of deploying SAS Viya on the BlueData EPIC platform include:

  • On-demand and automated provisioning of clustered applications, using Docker containers
  • Elasticity and agility, with the ability to create, expand, and shrink environments as needed
  • Compute / storage separation, to scale compute and storage resources independently for cost efficiency
  • Multi-tenancy with secure isolation, including built-in support for LDAP / AD integration and Kerberized Hadoop clusters
  • Automated integration of Hadoop as well as other frameworks and tools for data science, machine learning, and BI / ETL in the Big Data ecosystem

This week at the SAS Global Forum (in Denver, Colorado), BlueData will be showcasing our support for SAS Viya together with Dell EMC and D4t4 Solutions – highlighting the ability to run SAS Viya in a containerized environment on the Dell EMC Elastic Data Platform (powered by BlueData EPIC software).

To learn more, you can attend these sessions at the SAS Global Forum:

  • Simplify Deployment and Management of SAS Grid, SAS Viya, DevOps, and Open Source – Monday April 9th at 4:00pm (Mile High Ballroom, Table Talk 2)
  • Driving Data Science Productivity with SAS Viya and SAS Analytics for Containers on Dell EMC Elastic Data Platform – Tuesday April 10th at 10:00am (The Quad, Super Demo 11)
  • Deploying SAS Viya using Containers on Hadoop Powered by Dell EMC Elastic Data Platform, BlueData, and D4t4 Solutions – Tuesday April 10th at 2:00pm (Breakout Meeting Room 404)