Scaling Kubernetes pods is a core task for managing workloads in a cluster.\
Scaling pods is essential for handling increased traffic, optimizing resource usage, or ensuring high availability. Kubernetes has a straightforward way to achieve this using the kubectl scale command. This article talks about ways to scale Kubernetes pods effectively using kubectl scale.
Understanding Pod Scaling in Kubernetes
In Kubernetes, a pod is the smallest deployable unit that can run one or more containers. Scaling pods means increasing or decreasing the number of pod replicas running in a cluster. This is typically done using a Deployment, ReplicaSet, or StatefulSet, which ensures the desired number of pod replicas are maintained.
When the workload is dynamic, scaling becomes critical. For example, when the traffic is at its peak, you might want to have more pods scaled up automatically to handle that additional load. And for the opposite scenario during off-peak hours, you may want to reduce the number of pods to save unnecessary resource spending. Kubernetes makes this process simple and efficient.
The kubectl scale Command
The kubectl scale command is used to change the number of pod replicas in a Deployment, ReplicaSet, or StatefulSet. It allows you to specify the desired number of replicas directly or as a percentage of the current number of replicas. The basic syntax of the command is:
kubectl scale --replicas=<desired-replica-count><resource-type>/<resource-name>
For example, to scale a Deployment named web app to five replicas, you would run:
kubectl scale --replicas=5 deployment/web-app
This command tells Kubernetes to ensure that five replicas of the web app pods are running.
Scaling a Deployment
A Deployment is the most common way to manage pod replicas in Kubernetes. It ensures that a specified number of pod replicas are running at all times. Let’s look at an example of scaling a Deployment.
Assume you have a Deployment named api-service with three replicas. To scale this Deployment to five replicas, use the following command:
kubectl scale --replicas=5 deployment/api-service
After running this command, Kubernetes will create two additional pods to meet the desired replica count. You can verify the scaling by checking the status of the Deployment:
kubectl get deployment api-service
The output will show the updated number of replicas:
NAME READY UP-TO-DATE AVAILABLE AGE
api-service 5/5 5 5 10m
Scaling a StatefulSet
A StatefulSet is used for stateful applications where the order and uniqueness of pods matter. Scaling a StatefulSet is similar to scaling a Deployment, but it requires careful handling of stateful data.
For example, consider a StatefulSet named database with three replicas. To scale it to five replicas, use the following command:
kubectl scale --replicas=5 statefulset/database
Kubernetes will spin up two more pods in a careful manner, making sure that each pod is fully initialized before the next one is started. You can verify the scaling by checking the status of the StatefulSet:
kubectl get statefulset database
The output will show the updated number of replicas:
NAME READY AGE
database 5/5 15m
Scaling Based on Resource Usage
Kubernetes has a built-in feature known as horizontal pod autoscaling (HPA), which automatically adjusts the number of pod replicas according to CPU or memory usage. You can also manually achieve similar results using kubectl scale by observing resource usage and changing the replica count as required.
For example, if you notice that the CPU usage of a Deployment named worker is consistently high, you can scale it up to handle the load:
kubectl scale --replicas=10 deployment/worker
Conversely, if the CPU usage is low, you can scale it down to save resources:’
kubectl scale --replicas=3 deployment/worker
Scaling with Percentages
In some cases, you may want to scale pods by a percentage of the current replica count. While kubectl scale does not directly support percentages, you can achieve this by combining it with other commands. For example, to increase the number of replicas by 50%, you can use the following approach:
- Get the current number of replicas:
current_replicas=$(kubectl get deployment api-service -o jsonpath='{.spec.replicas}')
- Calculate the new number of replicas:
new_replicas=$((current_replicas + current_replicas / 2))
- Scale the Deployment:
kubectl scale --replicas=$new_replicas deployment/api-service
This approach allows you to scale pods dynamically based on the current state of the cluster.
Rolling Back a Scale Operation
If you scale a Deployment or StatefulSet and encounter issues, you can roll back the change using the kubectl rollout command. For example, to undo the scaling of a Deployment named web-app, you can run:
kubectl rollout undo deployment/web-app
This command reverts the Deployment to its previous state, including the number of replicas.
Advanced Scaling Techniques
While kubectl scale is powerful, it is limited to manual scaling based on replica counts. Below are some ways for more advanced scaling.
1. Scaling with Custom Metrics
For more advanced scaling, Kubernetes supports custom metrics through the Horizontal Pod Autoscaler (HPA). You can configure HPA to scale pods based on custom metrics such as request latency, queue length, or application-specific metrics.
For example, to scale a Deployment named web-app based on a custom metric, you would first need to configure the HPA:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 3
maxReplicas: 10
metrics:
-type: Pods
pods:
metric:
name: custom_metric
target:
type: AverageValue
averageValue: 500|
This configuration scales the web-app Deployment based on the custom_metric, ensuring the average value of the metric is 500.
2. Scaling with Cluster Autoscaler
In addition to scaling pods, you may need to scale the underlying nodes in your Kubernetes cluster to handle increased workloads. The Cluster Autoscaler automatically adjusts the size of the cluster by adding or removing nodes based on resource demands.
To enable Cluster Autoscaler, you need to configure it in your cloud provider’s environment. For example, in Google Kubernetes Engine (GKE), you can enable Cluster Autoscaler with the following command:
gcloud container clusters update my-cluster --enable-autoscaling --min-nodes=1 --max-nodes=10
This command configures the cluster to automatically scale between 1 and 10 nodes based on resource usage.
3. Scaling with Canary Deployments
Canary deployments are a strategy for gradually rolling out new versions of an application. By scaling a new version of a Deployment alongside the existing version, you can test the new version with a subset of users before fully rolling it out.
For example, to perform a canary deployment for a Deployment named web-app, you would first create a new Deployment with the updated version:
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app-canary
spec:
replicas: 1
selector:
matchLabels:
app: web-app
template:
metadata:
labels:
app: web-app
version: canary
spec:
containers:
-name: web-app
image: my-app:2.0|
After creating the canary Deployment, you can gradually increase the number of replicas while monitoring the performance and stability of the new version. Once you are confident in the new version, you can scale down the old Deployment and scale up the new one.
Best Practices for Scaling Pods
Below are some best practices architects/developers should consider for scaling pods in large-scale enterprise application infrastructure.
- Monitor Resource Usage: Check CPU, memory, and resource usage before scaling to figure out the number of replicas needed.
- Use Horizontal Pod Autoscaling: In dynamic workloads, HPA should be considered for automatically adjusting the number of replicas based on resource usage.
- Test Scaling Operations: Always test scaling operations in the staging environment before going live in production.
- Set Resource Limits: Define requests and limits on the pods to circumvent node overloading.
- Consider Stateful Application Scaling: When scaling stateful applications, take care that storage and network configuration can withstand the load increases.
Conclusion
Scaling the number of pods in Kubernetes is an easy task using the kubectl scale command. It allows the user to scale the number of pod replicas up or down for a Deployment, ReplicaSet, or StatefulSet with comparative ease. Proper scaling, such that the resource usage is monitored, will allow any application to run efficiently and reliably on a Kubernetes cluster.
If you’re an advanced user, merging the kubectl scale command with other commands and tools will allow dynamic and accurate scaling operations. This should give you the confidence to manage pod scaling according to your workloads.