Back to Blog

Enabling a Cloud-Like Experience for On-Premises GPU Infrastructure

In most enterprises today, GPU infrastructure is in high demand for computing-intensive machine learning and deep learning applications. If your organization is embarking on an AI initiative, it’s likely that your data science teams want access to GPU resources for their environments.

However, enterprise IT teams face several challenges when managing and provisioning GPU-enabled infrastructure. These challenges include:

  • Lack of visibility into the utilization of GPUs, which in turn makes it hard to accurately predict demand and plan GPU infrastructure purchases.
  • Provisioning bare-metal GPU servers with all the required tools and configuration can be a time-consuming process, often taking days or even weeks.

There are public cloud options and managed services that offer the ability to provision on-demand virtualized GPU resources: GPU-as-a-Service. But many enterprises have workload requirements that require on-premises GPU infrastructure, due to considerations involving security, performance, and data gravity.

GPU-as-a-Service (GPUaaS) for On-Prem Deployments

Now there is a GPU-as-a-Service (GPUaaS) solution that enables IT to provide its users with a cloud-like experience for on-demand, elastic provisioning of GPU infrastructure running in their own data centers. Today we’re introducing a new GPUaaS solution – powered by HPE’s best-in-class GPU-enabled servers and the BlueData software platform – as outlined in the HPE blog post here

HPE is the first infrastructure provider to offer a GPUaaS solution for on-premises deployments. This solution builds upon our recent announcement about HPE’s integration of BlueData, and it represents one of our new use cases for this powerful combination of industry-leading hardware and software. With this new solution, enterprise IT organizations can now deliver an “as-a-service” experience for provisioning GPU resources (i.e. NVIDIA’s Tesla or Quadro GPUs) in their on-premises deployments.

In particular, BlueData has developed several innovative capabilities to address the challenges faced by IT admins when provisioning GPU resources. Some of that functionality is outlined below.

End-to-end visibility into GPU infrastructure: The BlueData platform provides an intuitive web-based interface with a dashboard that provides complete visibility into GPU utilization across multiple servers and multiple user groups, including usage reports to enable showback or chargeback for these resources. 

On-demand, elastic provisioning of GPU resources: With BlueData, new containerized environments with GPU resources can be provisioned on-demand and then deprovisioned (releasing the GPUs) when no longer needed.

Allocation from a shared pool of GPU resources: GPU resources from multiple GPU-enabled servers – including HPE Apollo 6500 and HPE ProLiant DL 380/360 – can be pooled and shared across multiple project teams. IT admins can provision right-sized environments per the end-user’s requirements, ensuring that each workload is allocated the number of GPUs it needs.

Reallocation of resources to optimize GPU utilization: Due to the nature of deep learning workloads – roughly 5% of deep learning application code benefits from GPU parallelism — GPUs sit idle for roughly 40-50% of the time an application is running, on average.

User patterns could also factor in here. For example, a data scientist may kick off a training job that runs overnight and analyze the results during the following day. Thus, the GPUs are idle during the day. In this case, it makes sense to reallocate the GPU to a different workload during that time.

With BlueData, IT administrators can temporarily stop an environment and release the attached GPU resources while preserving the current state of the application. This allows IT admins to monitor usage and reassign the GPUs when they are no longer in use. When the application needs the GPUs again, the GPUs can be reattached to continue the execution.

As a result, the GPUaaS solution from HPE enables enterprise IT organizations to improve business agility, optimize GPU utilization, and increase ROI for their on-premises GPU infrastructure.

You can read the solution brief posted here to learn more. And to see how it works, check out this demo video of the new GPUaaS solution – powered by HPE servers and BlueData software: