Reduce Kubernetes cluster costs with kube-downscaler

New Titanium Cloud Server has shared with you a total of 772 pieces of technical information.

Introduction

Kube-downscaler is an open source tool that allows users to define when pod resources in Kubernetes are automatically scaled down. This helps reduce infrastructure costs by reducing resource usage during off-peak hours.

In this article, we will detail the functionality, installation, and configuration of kube-downscaler, as well as its use cases and future prospects.

Features of kube-downscaler

Kube-downscaler is a powerful scheduling-based tool for upgrading or downgrading applications in a Kubernetes cluster. In this section, we’ll explore some of the tool’s key features:

Compatibility with Kubernetes features or tools

Kube-downscaler also supports Horizontal Pod Autoscaling (HPA) and can be used in conjunction with HPA to ensure the required number of replicas are maintained for the application. This enables kube-downscaler to provide additional flexibility and fine-grained control for application scaling in Kubernetes.

Karpenter and kube-downscaler are two tools that can be used together to provide a complete and powerful resource management solution for Kubernetes clusters. By using Karpenter in conjunction with kube-downscaler, Kubernetes clusters can benefit from horizontal and vertical scaling. Downscaler allows reducing the number of pods, while Karpenter optimizes node utilization by consolidating pods onto fewer or different types of machines.

Compatibility with Kubernetes features or tools

Automatically scale deployment replicas based on defined time periods

Kube-downscaler can automatically scale deployment replicas based on predefined time periods. This means we can set up a schedule to increase or decrease the number of replicas at specific times of the day, week, or month.

For example, if we know that our application experiences high traffic during certain times of the day, we can configure kube-downscaler to automatically scale up replicas during those times and then scale them down when traffic decreases.

This allows scaling in anticipation of peak loads, rather than waiting for peak loads to occur and be handled by the HPA. This can help optimize resource usage and ensure your application is always available and responsive.

But Kube-downscaler is mainly used to shrink replicas and optimize the cost of the cluster, and we usually use HPA to manage scaling.

Installation and configuration of kube-downscaler

kube-downscaler installation instructions on kubernetes cluster

Clone the kube-downscaler repository from GitHub:

git clone <https://codeberg.org/hjacobs/kube-downscaler.git>

Enter the kube-downscaler directory:

cd kube-downscaler

?

Edit the deploy/kube-downscaler.yaml file to customize the configuration to your specific needs. For example, you can adjust time zones, schedules, and scaling rules.
Apply the configuration to your Kubernetes cluster:

kubectl apply -f deploy/

This command will deploy the kube-downscaler controller and create a kube-downscaler deployment.

You can verify that the kube-downscaler controller is running by checking the logs for your kube-downscaler deployment:

kubectl logs -f deployment/kube-downscaler

?

After the installation is complete, some configuration is required.

Configure kube-downscaler according to specific user needs

Kube-downscaler provides customization of scaling plans by using annotations on Kubernetes deployment objects.

The downTimePeriod annotation in the deployment object can be used to specify the downtime period during which the deployment should not be extended.

The minReplicas annotation can be used to set the minimum number of replicas for a deployment.

These fields, combined with the kube-downscaler annotation, allow you to create customized scaling plans based on specific business needs and resource utilization patterns.

By adjusting these fields, you can configure kube-downscaler to scale your deployment in a way that optimizes application availability and cost efficiency.

The following is a simple configuration for deployment using kube-downscaler.

apiVersion: apps/v1
Kind: Deployment
metadata:
  name: random-deployment
    annotations:
        # Kube-downscaler
        downscaler/downtimePeriod: "Mon-Fri 00:00-07:00 Europe/Berlin"
        downscaler/minReplicas: 1
spec:
  replicas: 2
  selector:
    matchLabels:
      app: random
  template:
    metadata:
      labels:
        app: random
    spec:
      containers:
      - name: random-container
        image: random-image

With this configuration, the number of replicas will be reduced to 1 from midnight to 7am Monday through Friday (on the Europe/Berlin time line).

kube-downscaler will automatically start scaling down pods according to a defined schedule.

We currently have kube-downscaler installed and running on the Kubernetes cluster.

Algorithm

Kube-downscaler will scale down a deployed replica if all of the following conditions are true:

current time is not part of the “uptime” schedule, nor is it part of the “downtime” schedule. If true, plans are evaluated in the following order:
- Annotations for the downscaler/downscale-period or downscaler/downtime workload definition
- Annotations for the downscaler/upscale-period or downscaler/uptime workload definition
- Annotations on the downscaler/downscale-period or downscaler/downtime workload namespace
- Annotations on the downscaler/upscale-period or downscaler/uptime workload namespace
- --upscale-period or --default-uptime CLI parameters
- --downscale-period or --default-downtime CLI parameters
- UPSCALE_PERIOD or DEFAULT_UPTIME environment variable
- DOWNSCALE_PERIOD or DEFAULT_DOWNTIME environment variable
The workload’s namespace is not part of the exclusion list:
- If an exclusion list is provided, it will be used instead of the default (including only kube-system).
The workload’s tags do not match the tag list.
The name of the workload is not part of the exclusion list
The workload is not marked for exclusion (comment downscaler/exclude: "true" or downscaler/exclude-until: "2024-04-05")
No active Pods force the entire cluster into uptime (comment downscaler/force-uptime: "true" )

Minimum replicasMinimum number of replicas

By default, deployments are scaled down to zero copies. This can be configured via annotations on the deployment or its namespace, downscaler/downtime-replicas or via the CLI using --downtime-replicas .

Ex: downscaler/downtime-replicas: "1"

Specific workloadSpecific workload

Under normal conditions for HorizontalPodAutoscalers, this field cannot be set to zero, so downscaler/downtime-replicas should be set to at least 1. Regarding CronJobs , their status will be defined as we would expect with suspend: true .

Points to note

Note that the default grace period of 15 minutes applies to new nginx deployments, i.e.

If the current time is not in , it will not shrink immediately to Mon-Fri 9-17 (Buenos Aires timezone) , but will shrink after 15 minutes. downscaler will eventually record the following content:

INFO: Scaling down Deployment default/nginx from 1 to 0 replicas (uptime: Mon-Fri 09:00-17:00 America/Buenos_Aires, downtime: never)

Note that if HorizontalPodAutoscaler (HPA) is used with deployments, consider the following:

If scaling down to 0 replicas is required, the annotation should be applied on Deployment. This is a special case because minReplicas is not allowed to be 0 on HPA. Setting deployment replicas to 0 essentially disables HPA. In this case, HPA will emit events such as failed to get memory utilization: unable to get metrics for resource memory: no metrics returned from resource metrics API There are no Pods to retrieve metrics from.
If a reduction greater than 0 is required, the annotation should be applied on the HPA. This allows dynamic scaling of pods based on external traffic during downtime, and keeps minReplicas traffic low during downtime if there is no low traffic. Annotating a deployment instead of an HPA will cause a race condition, i.e. downscaling the deployment, kube-downscaler HPA will upgrade it when the deployment is higher minReplicas.

To enable the downscaler using --downtime-replicas=1 on the HPA, make sure to add the following annotation to both the deployment and the HPA.

$ kubectl annotate deploy nginx 'downscaler/exclude=true'
$ kubectl annotate hpa nginx 'downscaler/downtime-replicas=1'
$ kubectl annotate hpa nginx 'downscaler/uptime=Mon-Fri 09:00-17:00 America/Buenos_Aires'

Detailed configuration

Uptime/downtime spec

Downscalers are configured via command line arguments, environment variables, or Kubernetes annotations.

Time definitions (such as DEFAULT_UPTIME ) accept a comma-separated list of specifications, for example, the following configuration will narrow down all deployments during non-working hours:

DEFAULT_UPTIME="Mon-Fri 07:30-20:30 Europe/Berlin"

Only narrowed on weekends and Fridays after 20:00:

DEFAULT_DOWNTIME="Sat-Sun 00:00-24:00 CET, Fri-Fri 20:00-24:00 CET'

Each time specification can be in one of two formats:

Repeat canonical format - :-: . The time zone value can be any time zone, e.g. “US/Eastern”, “PST” or “UTC”.
Absolute canonical formats, - where each format is the ISO 8601 date and time format --
T::[ + -]:.

Replacement logic based on periods

Instead of strict uptime or downtime, you can choose a time period for upgrading or scaling down. The time definition is the same. In this case, zooming in or out only occurs during the time period and the rest of the time will be ignored.

If an upgrade or scale-down cycle is configured, uptime and downtime are ignored. This means that some options are mutually exclusive, for example, you can use --default-downtime or --default-downtime but not both at the same time.

This definition will scale down the cluster between 19:00 and 20:00. If you upgrade the cluster manually, the cluster will not be scaled down before 19:00-20:00 the next day.

DOWNSCALE_PERIOD="Mon-Sun 19:00-20:00 Europe/Berlin"

Command line options

Available command line options:

--dry-run

Run-only mode: doesn’t change anything, just prints what will be done
--debug

Debug mode: print more information
--once

Run the loop only once and exit
--interval

Loop interval (default: 30 seconds)
--namespace

Limit the downscaler to only work in a single namespace (default: all namespaces). This is mainly suitable for deployment scenarios where the deployer of kube-downscaler only has access to a given namespace (but not cluster access). If used simultaneously with --exclude-namespaces, nothing is applied.
--include-resources

Narrow such resources to a comma-separated list.
--grace-period

The grace period in seconds before a new deployment is scaled down (default: 15 minutes). The grace period starts when the deployment is created, i.e., updated deployments will be scaled down immediately regardless of the grace period.
--upscale-period

Alternative logic to only scale up for a given period (default: never), also available via the environment variable UPSCALE_PERIOD or via per-deployment downscaler/upscale-period annotation to configure
--downscale-period

Alternative logic to scale down only during a given period (default: never), also available via the environment variable DOWNSCALE_PERIOD or via the downscaler/downscale-period on each deployment Comment to configure
--default-uptime

Default time frame to scale up (default: always), also configurable via the environment variable DEFAULT_UPTIME or via annotations on each deployment downscaler/uptime
--default-downtime

Default time range to downscale (default: never), also configurable via the environment variable DEFAULT_DOWNTIME or via annotations on each deployment downscaler/downtime
--exclude-namespaces

Exclude namespaces from downgrade (list of regular expression patterns, default: kube-system), also configurable via the environment variable EXCLUDE_NAMESPACES. If used together with --namespace , nothing is applied.
--exclude-deployments

Exclude specific deployments/statesets/cronjobs from downgrading (default: kube-downscaler, downscaler), also configurable via the environment variable EXCLUDE_DEPLOYMENTS. Despite the name, this option will match the name of any included resource type (Deployment, StatefulSet, CronJob, etc.).
--downtime-replicas

Default value for replicas to downscale to, the annotation downscaler/downtime-replicas takes precedence over this value.
--deployment-time-annotation

Optional: The name of the annotation will be used instead of the resource’s creation timestamp. You should use this option if you want your resources to keep scaling up during the grace period ( --grace-period ) after deployment. The annotation’s timestamp value must be in the exact same format as Kubernetes: creationTimestamp %Y-%m-%dT%H:%M:%SZ . Recommendation: Automatically set this annotation through the deployment tool.
--matching-labels

Optional: List of workload tags covered by kube-downscaleer scope. All workloads with labels that do not match any workload in the list will be ignored. For backward compatibility, if this parameter is not specified, kube-downscaler will be applied to all resources.

Namespace DefaultsNamespace defaults

The DEFAULT_UPTIME , DEFAULT_DOWNTIME and FORCE_UPTIME exclusions can also be configured using namespace annotations. Where configured, these values supersede other global defaults.

apiVersion: v1
kind: Namespace
metadata:
    name: foo
    labels:
        name: foo
    annotations:
        downscaler/uptime: Mon-Sun 07:30-18:00 CET

The following annotations are supported at the namespace level:

downscaler/upscale-period
downscaler/downscale-period
downscaler/uptime : Sets the “uptime” for all resources in this namespace
downscaler/downtime : Sets “downtime” for all resources in this namespace
downscaler/force-downtime : Force downscaling of all resources in this namespace – can be true / false
downscaler/force-uptime : Force upscaling of all resources in this namespace – can be true / false
downscaler/exclude : Set to true to exclude all resources in the namespace
downscaler/exclude-until : Temporarily excludes all resources in the namespace until the given timestamp
downscaler/downtime-replicas : Overrides the default target replicas to scale down to (default: zero)

Use cases

The primary use case for this tool is to reduce costs by optimizing the utilization of Kubernetes cluster resources. However, it can also be used to warm up the cluster and avoid overreliance on HPA.

While this is not its primary purpose, this combination provides an alternative solution that ensures high availability of applications while minimizing infrastructure costs.

Reduce costs

Another use case for kube-downscaler is to prevent service outages during peak usage. By defining a plan for scaling resources during periods of high demand, kube-downscaler can help scale deployments pre-emptively and avoid HPA delays to ensure applications remain available and responsive even during peak usage.

Service interruption prevention

Suggestions

Extensions based on predefined plans, which may not be suitable for all use cases. Additionally, it does not support autoscaling, which means users must manually adjust scaling plans to meet changing needs.

Another solution to consider is Keda. Keda is an open source project that provides dynamic autoscaling capabilities for Kubernetes applications. Using Keda, users can set custom scaling rules based on various metrics such as queue length, CPU usage, or custom metrics.

This allows for more granular control over resource usage and ensures that the application always scales correctly to meet demand.

Additionally, Keda is compatible with a wide range of Kubernetes applications, including stateful and stateless applications, and supports multiple event sources such as Azure Event Hubs, Kafka, and RabbitMQ.

Conclusion

Kube-downscaler is a powerful tool for managing resource usage in Kubernetes clusters. By defining scaling plans, users can optimize resource usage in the cluster and reduce costs while ensuring that applications remain available and responsive even during peak usage.

While kube-downscaler is a valuable tool for managing resource usage in a Kubernetes cluster, it may have some limitations. If you need more granular control over resource scaling or need automatic scaling capabilities, it may be worth considering an alternative solution like Keda.

Recommended reading

Recommended videos