Understanding Cardinality in Prometheus Monitoring

Arun Kumar Peddapalli
Fournine Cloud
Published in
4 min readJul 24, 2023

--

Prometheus is a powerful open-source monitoring system used to collect, store, and query time-series data. It is widely used in modern DevOps practices and is known for its simple and efficient data model. In this blog, we will explore one of the most important concepts in Prometheus: cardinality.

Post Image

What is Cardinality?

In Prometheus, cardinality refers to the number of unique time series that are stored in the database. Each time series is identified by a set of labels, and the cardinality is the number of distinct label sets that are present in the data. High cardinality means that there are many distinct label sets, while low cardinality means that there are fewer distinct label sets.

Imagine that you have a Prometheus instance that is monitoring numerous servers, each with multiple services running on them. Let’s say that you have a metric called http_requests_total that tracks the number of HTTP requests received by each service. You decide to include labels for the server and service name in this metric, so that you can distinguish between requests made to different services on different servers.

Here is an example of what the metric might look like:

http_requests_total{server=”server1", service=”api”} 100
http_requests_total{server=”server1", service=”frontend”} 500
http_requests_total{server=”server2", service=”api”} 200
http_requests_total{server=”server2", service=”frontend”} 300

In this example, there are four distinct time series, each identified by a unique combination of labels. The cardinality of this metric is four, because there are four distinct label sets present in the data.

Now imagine that you decide to add a label for the HTTP method used for each request, so that you can track the number of GET and POST requests separately. Here is an example of what the metric might look like now:

http_requests_total{server=”server1", service=”api”, method=”GET”} 50 http_requests_total{server=”server1", service=”api”, method=”POST”} 50
http_requests_total{server=”server1", service=”frontend”, method=”GET”} 200
http_requests_total{server=”server1", service=”frontend”, method=”POST”} 300
http_requests_total{server=”server2", service=”api”, method=”GET”} 100
http_requests_total{server=”server2", service=”api”, method=”POST”} 100
http_requests_total{server=”server2", service=”frontend”, method=”GET”} 150
http_requests_total{server=”server2", service=”frontend”, method=”POST”} 150

In this example, there are now eight distinct time series, each identified by a unique combination of labels. The cardinality of this metric has increased to eight, because there are eight distinct label sets present in the data.

As you can see, adding additional labels to metrics can quickly increase the cardinality of your data set. This can impact the performance and resource utilization of your Prometheus deployment, so it’s important to carefully consider which labels are necessary for analysis, and to regularly monitor and optimize the system to ensure that it continues to perform well as the data set grows.

Why is Cardinality important?

Cardinality is an important consideration when designing and scaling a Prometheus deployment. High cardinality can result in increased memory usage and longer query latencies, while low cardinality can result in less flexible queries and less granularity in the data.

For example, imagine that you have a Prometheus instance that is monitoring numerous servers, each with multiple services running on them. If you create a unique time series for each combination of server and service, you could quickly end up with a very high cardinality. This could result in slow query performance and increased resource usage.

The Impact of High Cardinality

A. Increased Resource Utilization

When you have high cardinality metrics in Prometheus, it can significantly increase the amount of storage space required to store the time-series data. Each unique combination of labels creates a new time-series, which means that high cardinality can quickly result in numerous time-series. This can increase the amount of memory and storage required by Prometheus, leading to higher resource utilization and increased costs.

B. Slower Query Times

High cardinality can also impact query times in Prometheus. When querying metrics, Prometheus needs to search through all the time-series data to find the relevant data points. With high cardinality, this process can take longer due to the large number of time-series that need to be searched. As a result, queries can take longer to complete, and users may experience slower response times.

C. Performance Issues

In addition to increased resource utilization and slower query times, high cardinality can also impact the performance of Prometheus. When Prometheus is unable to handle the high number of time-series, it may lead to performance issues such as slow ingestion of new data or even system crashes. This can have a significant impact on your monitoring system and cause downtime for your applications.

Managing Cardinality:

To manage and reduce cardinality, Prometheus provides several tools and techniques, including:

  1. Label drop: Removing unnecessary labels from metrics before they are stored in the database. This can help reduce the number of distinct label sets and improve query performance.
  2. Label relabelling: Modifying or merging labels to reduce the number of distinct label sets. This can be useful when dealing with metrics that have multiple labels that are not all needed for analysis.
  3. Downsampling: Reducing the resolution of data over time to reduce the number of time series stored in the database. This can help reduce the amount of data stored in the database and improve query performance.
  4. Federation: Splitting the data across multiple Prometheus servers to reduce the cardinality of each individual server. This can be useful when dealing with large amounts of data that cannot be efficiently stored on a single server

Best Practices:

Here are some best practices to follow when dealing with cardinality in Prometheus:

  1. Limit the number of labels: Only include labels that are necessary for analysis. Avoid adding labels that are not needed, as they can quickly increase cardinality.
  2. Use label drop and relabelling: Remove unnecessary labels from metrics before storing them in the database. Modify or merge labels to reduce the number of distinct label sets.
  3. Monitor and optimize: Regularly monitor and optimize the system to ensure that it continues to perform well as the data set grows.

Conclusion:

Cardinality is a key factor in the health and happiness of your Prometheus system. By keeping it under control, you can avoid performance problems and resource headaches. So, show your system some love and manage that cardinality!

--

--