Use Kubecost to quantify the cost of using Kubernetes

In recent years, Kubernetes has become a widely used container orchestration platform. With the use of Kubernetes clusters, different modes of operation follow. Some enterprises prefer a hard multi-tenancy model, where one cluster corresponds to one tenant, while others prefer a soft multi-tenancy model, where one cluster corresponds to multiple tenants. The soft multi-tenancy model has been adopted by many enterprises because it can reduce a lot of operational work. However, under a soft multi-tenancy model, visibility into the allocation of costs to different tenants is important so that the organization can be billed.

Requirements

We are using a soft multi-tenant Amazon EKS cluster and leveraging Kubernetes namespaces for multi-tenancy. AWS provides a cost explorer that can be used for cost reporting of nodes, EBS, and the entire network, but it cannot achieve cost separation of shared or pooled resources. Therefore, we chose the open source Kubecost to create tenant-based reports to better understand the cost situation per tenant and budget accordingly. In this article, we will detail how to use Kubecost in a multi-tenant EKS cluster for better visibility.

Kubecost

Kubecost helps you monitor and manage cost and capacity in your Kubernetes environment. – Kubecost documentation (https://docs.kubecost.com/)

Kubecost is available as both an open source product and a commercial product. This commercial product has a small number of additional features such as user authentication, report saving, enterprise support, and longer retention periods for metrics.

Kubecost installation

Multiple ways to install Kubecost (https://docs.kubecost.com/install) can be found here. We use Helm to install Kubecost in the cluster. These commands can be used to install Kubecost with Helm 3 in the default configuration. You will need a unique token, which can be obtained here (https://kubecost.com/install).

kubectl create namespace kubecost

helm repo add kubecost https://kubecost.github.io/cost-analyzer/

helm install kubecost kubecost/cost-analyzer --namespace kubecost --set kubecostToken="YWp1bmVqYUB0YXZpc2NhLmNvbQ==xm343yadf98"

There are a number of configuration options available to configure Kubecost at installation time. Here are some use cases we have in the process.

Network Costing

networkCosts.enabled=true

This is a very important identification if you want to capture network costs and isolate based on namespaces or tenants in the cluster. Typically, enabling this flag will provide a DaemonSet that maps traffic through the node to a cost model and uses it in cost reporting. Network rates are adjusted using cloud provider rates. Details on how this works can be found here (http://docs.kubecost.com/network-allocation). The collected network data is divided into different categories, such as internet egress, cross-region egress, and cross-cluster egress. You can override these network classifications and set your own.

Reuse node exporter and metrics service

prometheus.kubeStateMetrics.enabled=false
prometheus.nodeExporter.enabled=false
prometheus.serviceAccounts.nodeExporter.create=false

Prometheus, Grafana and Metrics Server also come with Kubecost installed by default. There are ways to repurpose your existing Prometheus and Grafana setup, but the process is tedious (https://docs.kubecost.com/getting-started#custom-prom). It requires some modifications to Prometheus scrape configuration, relabeling, logging rules, etc. The recommended approach for Kubecost installation is to reuse the existing node exporter and metric server (if available) in the cluster and setup separate Prometheus and Grafana that come with Kubecost installation.

Metric retention period

prometheus.server.retention=15d
prometheus.server.persistentVolume.size=32Gi

By default, you only have 15 days of metric retention and a 32Gb persistent volume available for Prometheus metric retention. You can use the following formula to calculate storage requirements based on retention periods:

needed_disk_space = retention_time_minutes * ingested_samples_per_minutes * bytes_per_sample

You can learn more about this in the Metric Storage Configuration section of the Getting Started documentation page (https://docs.kubecost.com/getting-started#storage-config).

Kubecost function

Cost Allocation

If you are doing cost optimization for Kubernetes, or want to get a clear picture of the cost of a specific tenant or service, the Kubecost cost allocation view is the space where you will spend most of your time. You can filter costs for all Kubernetes objects such as Deployments, StatefulSets, etc. If you are using soft multi-tenancy that uses namespaces, you can filter this view based on namespaces and have cost allocation across all tenants.

The cost allocation view provides detailed insights into major Kubernetes cost components such as compute, network, storage, and more. For compute, you’ll get a cost allocation for memory, CPU, and GPU. Likewise, if you are using StatefulSets, you can also get persistent volume costs calculated using cloud provider storage rates. If network costing is enabled using the flag above, then you will also get the network cost associated with the object.

Saving suggestions

This is a very useful feature of Kubecost as it provides some valuable advice that can save you a lot of money. These reports may not be accurate every time, but they are more or less summarized and will help you determine savings in certain parts of your cluster.

Savings recommendations actually cover the following areas:

Node and container resizing

It generates recommendations for compute node sizes and reports of Pods with overprovisioning requests. These recommendations can help you tune requests for nodes and pods, helping you better utilize cluster capacity.

Underutilized Nodes Report

This provides a report of nodes that are currently underutilized and their workload can be migrated or adjusted in other nodes. This is a very important report. The cluster autoscaler reduces the size of the cluster, but it has certain checks that can be used to identify and scale down resource-wasting nodes. The report actually gives you details on why you can’t scale down even though the nodes are underutilized.

underutilized storage

This provides details about unclaimed persistent volumes and any local storage attached to the node that is currently underutilized.

Expense report

The Kubecost report contains detailed information about cluster cost allocation. Just like the cost allocation view, you can generate these reports based on namespaces, Kubernetes objects, or labels. You can add filters and create reports specific to a particular tenant or team. It also provides an export feature that helps to share this with the team on a regular basis, increasing visibility. Make sure to set the correct retention period based on your reporting requirements using the ID mentioned in the installation section above.

Charged by savings plan, reserved instances

We are using AWS Savings Plan for Compute resources, so the actual rates for cluster nodes are different from the On-Demand prices. With Kubecost installed by default, you will see on-demand rates for nodes, as this would make the cluster expensive. Kubecost has a very nice feature of integration with AWS Cost and Usage Reports which provides detailed cost of AWS resources and also covers details of price adjustments (if you purchased any savings plans or reserved instances). The integration process is non-trivial and there are different ways to set it up depending on how your AWS account is structured. It’s slightly easier to set up if your billing account is the same as the one running Kubecost. In our case, the AWS account was built using AWS Organization, and Kubecost was running in one of the member accounts. Since the billing account is the master account, we created the AWS CUR in the master account and further steps in the member account. The diagram below illustrates how this setup is implemented. More details about the integration can be found here (https://docs.kubecost.com/aws-out-of-cluster.html).

Out-of-cluster costs

It is an extension of Kubecost integration with cloud provider cost reporting. From a setup standpoint, this is very similar to the image above. The only difference is that you need to perform some tag association. Many times we use a lot of cloud provider managed services with Kubernetes like RDS, MSK etc. The out-of-cluster cost feature helps you enhance Kubernetes cost reporting with these costings and ultimately use Kubecost as a complete stack cost reporting solution. The way it relates to tags is entirely based on resource tags. For example, you can use namespace identifiers and then tag your cloud resources accordingly. We are not using this feature because our cloud resources are shared by tenants.

More importantly

There are many other features in Kubecost that are not yet used, and I don’t think one article can cover them all. Some features worth looking into, notifications on Slack and email. You can set cost thresholds for namespaces and get alerted if any namespace goes over budget. If you are using Spot Instances, you can integrate Spot Profiles to get the correct pricing details. Kubecost can be integrated with AWS and GCP. It can also give you custom pricing to avoid integration with billing accounts.

Conclusion

Kubecost covered almost all of our needs, yet setting up Kubecost correctly required only a small amount of operational overhead compared to many other paid solutions on the market. But I feel the value it provides far outweighs the cost of configuring it properly. Kubecost is also updated very quickly and the team is always there to help with everything. If you’re looking for any open source tool to get Kubernetes cluster cost insights along with cloud provider cost details, then Kubecost is worth a try.