As enterprises move beyond experimentation to more widespread adoption of AI, a vast majority of them are running into “last mile” issues related to model deployment and management. Gartner predicts that by 2021, at least 50 percent of machine learning models built with the intention of being operationalized will not see the light of day.
What is “operationalization”? Admittedly, it’s a mouthful – and some even abbreviate it as “o16n”. But it’s the biggest challenge facing enterprises as they embark on the next phase in their AI journey with machine learning (ML). Note: In this blog post, I’ll refer primarily to ML, but the same applies to deep learning (DL), a subset of ML.
Quite simply, the dictionary defines operationalize as “to put into use” or “to make operational”. In the context of AI / ML, it’s the process of moving from development and training to deployment and management of ML models at scale in production. Ultimately, only operational ML models deliver business value.
Today, most enterprises lack the right tools for the operationalization and large-scale implementation of ML models. In addition, there is a lack of standards around coding, sharing, and collaboration. Data scientists typically work in siloed development environments—often on their laptops—with limited ability to collaborate, share, and reproduce their models in a distributed environment.
Once the model is trained, it is handed off to DevOps or software engineering; data scientists have little to no visibility into model performance. Monitoring is ad-hoc and inconsistent, making the risk of inaccurate predictions very real (e.g., due to model drift and decay). It is this lack of tooling, collaboration, governance, and standardized processes that result in the failure to operationalize ML models.
Now, I know what most of you are thinking: Why not just use DevOps tools and practices? Data scientists have some practices and needs in common with software developers. And ultimately, a DevOps-like approach and CI/CD best practices can be applied to ML models.
But data science has several unique characteristics that don’t fit DevOps. For one thing, data science and machine learning are not just about code but also about the data and the models. And ML workflows are very iterative in nature, so force-fitting DevOps tools and methodologies would place significant limitations on ML development.
Operationalizing Machine Learning
The emerging field of machine learning operations (ML Ops) aims to bring a DevOps-like approach to standardize processes for ML workflows. There are many tools for building and developing ML / DL models; there are also some tools available to help with deployment and monitoring. And there are some services from public cloud vendors that address different aspects of the ML lifecycle.
But what’s needed is an end-to-end solution for the complete lifecycle to build, train, deploy, and monitor ML and DL models in an enterprise environment. And in the enterprise, the reality is that a cloud-only deployment is often neither a viable option nor a panacea. In most cases, they use on-premises infrastructure for certain ML workloads and aspects of the ML lifecycle due to considerations around data gravity, security, and performance. They need a solution that can be deployed either on-premises, in the public cloud, or in a hybrid cloud.
Key Requirements for Operational ML at Enterprise Scale
At Hewlett Packard Enterprise, we believe that an ML Ops solution for any enterprise should address the entire ML lifecycle—from data prep to model development and training to deployment and model management. In working with our customers, it became clear that they required an end-to-end solution that provided the ability to:
- Experiment with a wide variety of ML / DL tools, both open-source and commercial software
- Access distributed training clusters without burdening data scientists and/or IT with the overhead of managing the infrastructure for each training job or use case
- Seamlessly transition from model training to deployment without the need to refactor code
- Share code and models across data science teams, with collaboration as well as model reproducibility and governance
- Monitor model performance, determine model drift, and trigger retraining of models
- Ensure enterprise-grade security, with integration to authentication and access control systems
- Deploy and run ML / DL workloads on-premises, in the public cloud, or in a hybrid model
To address these requirements and the growing need in the market, we are introducing our new HPE Machine Learning Ops (HPE ML Ops) solution. HPE ML Ops brings DevOps-like speed and agility to the entire ML lifecycle, leveraging and extending the capabilities of the container-based BlueData EPIC software platform to operationalize ML in the enterprise.
HPE Machine Learning Ops
HPE ML Ops is a secure and highly scalable software solution for the complete machine learning lifecycle—from sandbox experimentation and distributed model training all the way to model deployment in production.
HPE ML Ops provides data scientists with the ability to quickly spin up containerized environments with their choice of machine learning or deep learning tools, including both open source frameworks and commercial applications. The new solution provides collaboration features such as source control, model repository, and project repository to bring a standardized approach to coding and model management. And much more.
With HPE ML Ops, enterprises can accelerate their deployment of AI / ML on a solid foundation. And as they mature, they can scale their infrastructure to suit their business needs with the flexibility to deploy on-premises, in any public cloud, or in a hybrid model.
To learn more: