We all know that Artificial Intelligence (AI) is here to stay. We experience AI everywhere and enjoy its benefits without even realizing it. From streaming video services like Netflix, which learn our viewing behaviors and patterns so we spend our valuable time watching the shows we like best; to digital assistants like Amazon’s Alexa, which can recognize speech patterns to follow our commands and answer our questions; to using AI-powered apps like Lyft or Google Maps to hail the closest ride or navigate around traffic and get from point A to point B, AI is now embedded in our daily lives.
Each of these everyday consumer applications uses machine learning (ML) for their AI use cases. But it’s not just the consumer technology giants and startups that are using ML technology to power AI-enabled applications; enterprises in virtually every industry are now exploring ML for a wide range of different AI use cases, ranging from fraud detection to medical diagnosis, stock market prediction, and autonomous driving to name just a few.
Enterprise adoption is still in the relatively early stages, but a recent Deloitte study predicted that the number of ML implementations and pilot projects will double in 2018 over last year, and will double again by 2020.
I usually see these AI use cases and ML implementations fall into three categories, based on their outcomes: 1) maximizing operational efficiency, 2) improving the customer experience and 3) delivering innovation with a new business model or discovery. Here are some examples:
Maximizing Operational Efficiency
Predictive Maintenance (PdM) techniques are designed to help determine the condition of in-service equipment, in order to predict when maintenance should be performed. This approach promises cost savings over routine or time-based preventive maintenance, because tasks are performed only when warranted. Here at BlueData, I’ve worked with one of our manufacturing customers who uses Spark MLlib for their ML algorithms to enhance the accuracy of failure predictions as well as improve the corrective actions needed to avoid them in the future.
Improving the Customer Experience
In order to retain their loyal user base, many enterprises are deploying deep learning (DL) and natural language processing (NLP) technologies to entice their customers with new offerings or better service. For example, one of our customers in the financial services industry is applying DL algorithms with TensorFlow to determine the most convenient car loan program that meets their customers’ specific needs and removes the complexity of the car buying experience. We have other customers in the healthcare industry using ML to improve patient care, using sensor data to deliver personalized precision medicine or improve disease diagnosis with genomics research.
Delivering Business Innovation
One of the reasons that companies like Blockbuster, DEC, and Toys”R”Us have gone out of business is their inability to build a new revenue model to sustain their growth in the face of disruption. Using AI and ML technologies to drive business innovation, I see enterprises moving in the other direction. For example, last year Allstate announced the creation of a stand-alone unit (named Arity) for a telematics business. Using ML algorithms, their goal is to expand their revenue stream beyond insurance by offering analytics products and services to third parties. Another example is one of our customers in the life sciences industry using AI and ML to dramatically accelerate the drug discovery process – bringing potentially life-saving new medicines to market much faster than ever before.
Challenges in Building Distributed ML Pipelines
In working with these and other enterprise customers to deploy machine learning and deep learning pipelines for their AI use cases, I’ve seen several patterns emerge.
Here are some of the most common challenges I’ve seen with enterprises that are looking to build, deploy, and operationalize their ML / DL pipelines:
- The analytics tools that they’ve used traditionally were built for structured data in databases. The AI use cases that they need to work on with ML / DL tools require a large and continuous flow of typically unstructured data.
- Their data scientists and developers may have built and designed their initial ML / DL algorithms to operate in a single-node environment (e.g. on their laptop, virtual machine, or cloud instance). But they need to parallelize the execution in a multi-node distributed environment.
- They can’t meet their AI use case requirements with the data processing capabilities and algorithms of a single ML / DL tool. They need to use data preparation techniques and models from multiple different tools, whether open source and/or from commercial vendors.
- The new data access patterns and modeling techniques required for AI use cases with ML / DL are new and unfamiliar to most data scientists and developers, and the learning curve is steep.
- Increasingly, data science teams are working in more collaborative environments. It’s truly a team sport, and the workflow for building distributed ML / DL pipelines spans multiple different domain experts.
- For many ML / DL deployments, it’s common practice to use hardware acceleration such as GPUs to improve processing capabilities. But these are expensive resources and this technology can add to the complexity of the overall stack.
One of the most popular ML / DL tools is TensorFlow, often used together with technologies like Python and GPUs. But there are many other open source and commercial tools that may be required, depending on the use case. Data scientists and developers want to work with their preferred ML / DL tools, they need the flexibility to enable rapid and iterative prototyping to compare different techniques, and they often need access to real-time data. In most large organizations, they also need to comply with enterprise security, network, storage, user authentication, and access policies.
But most enterprises lack the skills to deploy and configure these tools in a multi-node distributed environment. And it can be challenging to integrate these environments with their existing security policies, data infrastructure, and enterprise systems – whether on-premises, in the public cloud, using CPUs and/or GPUs, with a data lake or with cloud storage. These organizations quickly realize:
- The technologies and frameworks for ML / DL are different from existing enterprise systems and traditional data processing frameworks.
- There are multiple components (both software and infrastructure) and it’s a complex stack, requiring version compatibility and integration across these various components.
- It’s a time-consuming endeavor to assemble all the systems and software required, and most organizations lack the skills to deploy and wire together all of these components.
- Bottom line, it’s difficult to build and deploy multi-node distributed environments for ML / DL pipelines in the enterprise – even for sandbox and dev/test use cases.
The exploratory and iterative nature of ML / DL means that your data scientists can’t afford to wait for days or weeks before getting access to the tools they need. But it may take weeks and even months for your team to get ramped up and started.
For example, you will likely need to hire or train team members to gain expertise in technologies like TensorFlow. You’ll need to build pipeline integrations between these different frameworks and tools, and test them on the infrastructure you plan to use. And as you begin to add more use cases and users, you’ll need to scale the infrastructure and integrate more tools into the stack.
BlueData AI / ML Accelerator
Here at BlueData, I’ve worked with many of our enterprise customers to overcome these challenges – to build, deploy, and operationalize their machine learning and deep learning pipelines. And together with my colleagues in our product and services teams, we came up with a new turnkey solution to help other enterprises implement their AI use cases and deploy distributed ML and DL applications.
The BlueData AI / ML Accelerator solution includes a one-year subscription for the container-based BlueData EPIC software platform either on-premises or in the public cloud — along with consulting, training, and support to accelerate your deployment. This turnkey solution provides:
- Ready-to-run Docker images of popular ML / DL tools (including TensorFlow, SparkMLlib, H2O, Caffe2, Anaconda, and BigDL) for use in large-scale distributed computing environments.
- The ability to spin up multi-node ML / DL environments in a matter of minutes via self-service, with REST APIs or a few mouse clicks in a web UI.
- Collaboration in a multi-tenant architecture, with integrated notebooks (e.g. Jupyter, Zeppelin, RStudio) and other JDBC-supported tools.
- Secure integration with distributed file systems including HDFS, NFS, and S3 for storing data and ML / DL models.
- Automated and reproducible provisioning, enabling on-demand creation of identical ML / DL environments and reproducible results.
We start by addressing the top objectives for your AI initative, implementing and configuring the BlueData EPIC platform based on your requirements, and training your staff throughout the project so they can confidently operate the platform on their own moving forward. BlueData EPIC is a highly flexible platform that can be extended to run a wide range of different AI and Big Data analytics uses cases and applications. And while initial ML / DL deployments may focus on dev/test and pre-production environments, the platform is designed for large-scale production implementations in the enterprise.
With BlueData EPIC, you can create multi-node TensorFlow environments on-demand using our web-based user interface (or via REST APIs):
You can also use BlueData’s quick launch templates for one-click cluster creation:
And as indicated in the screenshot above, you can easily create multiple environments with different versions or combinations of tools side by side – to enable parallel execution for comparing different libraries and techniques. With the portability of Docker containers, you can deploy the same reproducible environments regardless of the underlying infrastructure – whether on-premises, in the public cloud, using CPUs and/or GPUs, with a data lake or with cloud storage.
The new AI / ML Accelerator solution is designed for out-of-the-box deployments with open source technologies including TensorFlow, SparkMLlib, H2O, Caffe, Anaconda, and BigDL. However, it can be easily configured for use with other ML / DL technologies – including both open source tools as well as commercial applications. Our services team can provide additional assistance in developing new Docker application images for these tools, or in preparing your data models and data pipelines.
Now enterprises can get up and running quickly with distributed ML / DL applications in multi-node containerized environments – either on-premises or in the public cloud. Fully-configured environments can be provisioned in minutes, with self-service and automation. Data scientists and developers can rapidly build prototypes, experiment, and iterate with their preferred ML/DL tools for faster time-to-value. And their IT teams can ensure enterprise-grade security, data protection, and performance – with elasticity, flexibility, and scalability in a multi-tenant architecture.
To learn more about how this works in practice, you can watch this on-demand webinar on deploying deep learning with TensorFlow and Spark – using GPUs and Docker containers: