There is a lot of focus and attention on Big Data analytics today – and as I wrote in a recent blog post, it’s all about the applications. But there are (of course) many infrastructure considerations to make the analytics and applications work seamlessly for your data scientists, analysts, and other users.
One vital issue is how to ensure prioritization of different Big Data applications (and different user groups) sharing the same underlying infrastructure in a multi-tenant environment. These applications are often very resource-intensive and performance is essential for Big Data analytics use cases, so potential concerns about resource interference and resource contention are inevitable. And you need to take into account different types of users (e.g. heavy users, light users) sharing the same Hadoop clusters. This is especially important for real-time or near real-time analytical applications (e.g. fraud detection or personalized recommendation engines) that require particularly fast response-times.
So it’s imperative that important applications and user groups are prioritized appropriately. Time-sensitive and critical applications need a greater share of CPUs, memory, network, etc., but without starving your other applications. You need to ensure the right allocation of resources, appropriate response times, and system availability so that data scientists can quickly and easily analyze data on shared Hadoop clusters. This not only improves their user experience and ensures timely business insights, it also reduces operational cost by allowing for more efficient use of resources.
So how do you do this for Hadoop (or Spark)? In this post, I’ll dive into the details of how to do this prioritization using Docker containers.
What is QoS and Why Should I Care?
QoS refers to Quality of Service, and Quality of Service is a measure of the overall performance experienced by a specific user of a computer system. If per user QoS is implemented on a computer system, then each user of the system can experience a different level of performance. So in a multi-tenant environment, QoS refers to how you do the prioritization of applications and user groups that I described above.
Here at BlueData, we use Docker containers in our Big Data infrastructure platform. And we have been working with Linux cgroups (control groups) for a while now; cgroups is a Linux mechanism to control the allocation of resources (e.g. CPU, memory) to specific processes. With cgroups, you can use the various resource settings to implement differential priority/performance between containers. This means that you can make some containers run faster than others; in other words, you can implement per user Quality of Service.
Let’s break this down. Containers have many different resources: CPU, memory, blkio, network, etc. For the sake of simplicity, let’s limit this discussion to the CPU resource. The cgroup CPU resource refers to a container’s likelihood, at any given moment, to be selected by the Linux scheduler to run on a physical processor. Each container can be assigned a different number of cgroup CPU “shares”. CPU shares (aka cpu.shares) are an indication of the priority of the given container to be selected to run. The greater the number of shares, the more frequently the container will be scheduled on a processor, and the faster the application within the container will run.
QoS, Docker, and Hadoop: An Example
To see the impact of this, consider the following simple example. I used the BlueData EPIC software platform to create Hadoop clusters running within a collection of Linux containers. I installed BlueData EPIC 2.0 and instantiated two different CDH 5.4.3 (Cloudera) clusters. Each of the clusters consisted of two containers. One container of each cluster ran the Hadoop ResourceManager service and the other container of each cluster ran the NodeManager service.
For ease of reference, let’s refer to one of the clusters as “HighCluster” and the other as “LowCluster”. HighCluster consists of containers bluedata-32 (running ResourceManager) and bluedata-33 (running NodeManager). LowCluster also consists of two containers, one named bluedata-28 (running ResourceManager) and the other named bluedata-29 (running NodeManager).
The command line screenshot below shows the 4 containers running on the server, with their container IDs.
Let’s see the number of cpu.shares (i.e. CPU allocation) assigned to each container. In the screenshot below, we see that container bluedata-33 (2b*) in HighCluster is assigned 2048 shares while container bluedata-32 (b0*) is assigned 4096 cpu.shares. Similarly, container bluedata-29 (c2*) in HighCluster is assigned 2048 shares while container bluedata-28 (f8*) has 4096.
Next, I used the Hadoop example program “pi” as a sample of a CPU-intensive MapReduce job. The pi estimator uses a statistical (quasi-Monte Carlo) method to estimate the value of pi.
The actual invocation was: hadoop jar /opt/cloudera/parcels/CDH-5.4.3-1.cdh5.4.3.p0.6/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.4.3.jar pi 250 250
I ran the same job in both clusters at the same time.
I ran a number of iterations giving increasing numbers of cpu.shares to containers bluedata-32 and bluedata-33 in HighCluster, while maintaining the same cpu.shares value for containers bluedata-28 and bluedata-29 of LowCluster.
The cpu.shares setting of a container is performed by writing to the /cgroup file as indicated in the following screenshot.
The cpu.shares configuration per iteration is given in the table below, showing the increased CPU allocation for the containers in HighCluster while the allocation for the LowCluster containers remains constant.
Note: The cpu.shares for the container running the ResourceManager in the HighCluster is equal to the cpu.shares for the container running the NodeManager + 2048. This gives the container running the ResourceManager a slight CPU priority over the NodeManager.
We can also look at this data another way in the graph below.
What we see in the graph is that as the number of cpu.shares given to HighCluster increases, the execution time (represented in seconds) for the pi job on that cluster decreases. Correspondingly, the execution time of the same pi job on LowCluster increases, due to the contention for CPU resources.
We also see that the incremental improvement in execution time for HighCluster decreases in subsequent configurations and eventually plateaus (i.e. from x6 to x7). This indicates that the pi job requires other system resources in addition to CPU; contention for those resources ultimately has more impact on the job run time than the CPU allocation.
Implementing QoS for Multi-Tenant Hadoop
So this little experiment showed that the number of cpu.shares on a container dramatically impacts how the application running in the container performs. And this isn’t just a theoretical experiment, it has significant real-world implications.
Different users of shared compute resources in a multi-tenant Hadoop or Spark environment for Big Data analytics will have different priorities. One user may be running a mission-critical job with important SLA requirements; that user needs high priority access to the compute resources. Whereas another user may be doing some application testing or QA, and can make due with a lower priority.
This is all well and good, but who wants to deal with these low-level Docker and Linux commands in order to map user/application priority onto containers and cgroup cpu.shares for Big Data analytics? And CPU is not the only resource. Memory consumption as well as network and storage bandwidth are also key components of QoS that need to be allocated and controlled.
What’s needed is a system that makes it easy to specify and implement Quality of Service priorities in a way that is readily understood by the end user – without having to understand the inner-workings of the infrastructure. It requires an infrastructure platform that exposes what is important for defining the application and user SLAs, while hiding the pesky implementation details.
As you might expect, I’ve been working on a solution that does this (and it’s epic). Stay tuned for more on QoS in a multi-tenant Big Data environment with Docker containers and BlueData’s EPIC software platform.