Monitoring Spring Boot Applications with Prometheus – Part 2

In my previous post, I describe how to use Prometheus and its JVM client library in a Spring Boot application to gather common JVM metrics. In this blog, I will demonstrate how to write your own application specific metrics using the client library. Prometheus client libraries support 4 core metric types: Counter, Gauge, Histogram and Summary. Its documentation has a brief but clear explanation of each metric types.

I will create 2 metrics to gather statistics about incoming requests to a Spring web application. In particular, a counter to count how many requests the web application handles and a summary for measuring the processing time of the incoming requests.

Application Set Up

To demonstrate the metrics we are going to implement, I have the following controller implementing two endpoints /endpointA and /endpointB. They do nothing except wasting a random amount of time between 0 and 100 ms.

@RestController
public class HomeController {
     private Logger logger = LoggerFactory.getLogger(getClass());
 
     @RequestMapping("/endpointA")
     public void handlerA() throws InterruptedException {
          logger.info("/endpointA");
          Thread.sleep(RandomUtils.nextLong(0, 100));
     }
 
     @RequestMapping("/endpointB")
     public void handlerB() throws InterruptedException {
          logger.info("/endpointB");
          Thread.sleep(RandomUtils.nextLong(0, 100));
     }
}

Counter Metric

A counter is a metric of numeric value that never goes down. Here we add a request handler to keep track of the cumulative number of requests the web application receives:

import io.prometheus.client.Counter;
..
public class RequestCounterInterceptor extends HandlerInterceptorAdapter {

     // @formatter:off
     // Note (1)
     private static final Counter requestTotal = Counter.build()
          .name("http_requests_total")
          .labelNames("method", "handler", "status")
          .help("Http Request Total").register();
     // @formatter:on

     @Override
     public void afterCompletion(HttpServletRequest request, HttpServletResponse response, Object handler, Exception e)
 throws Exception {
          // Update counters
          String handlerLabel = handler.toString();
          // get short form of handler method name
          if (handler instanceof HandlerMethod) {
               Method method = ((HandlerMethod) handler).getMethod();
               handlerLabel = method.getDeclaringClass().getSimpleName() + "." + method.getName();
          }
          // Note (2)
          requestTotal.labels(request.getMethod(), handlerLabel, Integer.toString(response.getStatus())).inc();
     }
}

Note:

  1. We implement a counter using io.prometheus.client.Counter class to keep track of number of incoming requests handled by this handler. The counter is named http_requests_total and consists of a number of labels (method, handler and status). A label is an attribute of a metric which can be used in query to filter and aggregate metrics.
  2. The counter is incremented using the Counter’s inc() method. The values of the metric labels method, handler and status are populated with the request http method (get/post), the spring mvc controller and method, and response http status respectively.

Summary Metric

A summary is a complex metric which track a number of observations as well as their counts. See here for a comprehensive explanation of summary and histogram metrics. Similar to previous section, we implement the metric within a request handler class:

import io.prometheus.client.Summary;
...
public class RequestTimingInterceptor extends HandlerInterceptorAdapter {

      private static final String REQ_PARAM_TIMING = "timing";

      // @formatter:off
      // Note (1)
      private static final Summary responseTimeInMs = Summary
           .build()
           .name("http_response_time_milliseconds")
           .labelNames("method", "handler", "status")
           .help("Request completed time in milliseconds")
           .register();
      // @formatter:on

      @Override
      public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) throws Exception {
           // Note (2)
           request.setAttribute(REQ_PARAM_TIMING, System.currentTimeMillis());
           return true;
      }

      @Override
      public void afterCompletion(HttpServletRequest request, HttpServletResponse response, Object handler, Exception ex)
 throws Exception {
           Long timingAttr = (Long) request.getAttribute(REQ_PARAM_TIMING);
           long completedTime = System.currentTimeMillis() - timingAttr;
           String handlerLabel = handler.toString();
           // get short form of handler method name
           if (handler instanceof HandlerMethod) {
                Method method = ((HandlerMethod) handler).getMethod();
                handlerLabel = method.getDeclaringClass().getSimpleName() + "." + method.getName();
           }
         // Note (3)
         responseTimeInMs.labels(request.getMethod(), handlerLabel, Integer.toString(response.getStatus())).observe(
 completedTime);
      }
}

Note:

  1. We implement a metric with class io.prometheus.client.Summary. The same set of labels as used in the request counter is used here.
  2. The response time is measured here as time lapsed between the calls to preHandle and afterCompletion methods. This is just for illustration only so I can demonstrate the use of Prometheus client library.
  3. The summary metric is updated by calling the observe() method with the response time value.

Collecting Metrics

Now that we implement the request handler interceptors, we can register them to the test endpoints and start up the Spring Boot web application. Hit the two endpoints using the browser and then go to the Prometheus url, e.g. http://localhost:8080/prometheus, should return something similar to the following:

...
# HELP http_response_time_milliseconds Request completed time in milliseconds
# TYPE http_response_time_milliseconds summary
http_response_time_milliseconds_count{method="GET",handler="HomeController.handlerA",status="200",} 3.0
http_response_time_milliseconds_sum{method="GET",handler="HomeController.handlerA",status="200",} 169.0
http_response_time_milliseconds_count{method="GET",handler="HomeController.handlerB",status="200",} 1.0
http_response_time_milliseconds_sum{method="GET",handler="HomeController.handlerB",status="200",} 59.0
# HELP http_requests_total Http Request Total
# TYPE http_requests_total counter
http_requests_total{method="GET",handler="HomeController.handlerA",status="200",} 3.0
http_requests_total{method="GET",handler="HomeController.handlerB",status="200",} 1.0

Note the summary metric http_response_time_milliseconds actually collects two time series data: _count and _sum. In practice, we don’t require a separate counter metric. It is more to demonstrate how to implement a simple counter with the Prometheus client library.

Also, data for each metric are collected separately for each distinct set of label values. In this example, we have metrics for each individual endpoints given the label values for the label handler is different. Since we also include the response status as a label, we will potentially get other metrics with non 200 status value, for example

http_response_time_milliseconds_count{method="GET",handler="HomeController.handlerA",status="500",} 3.0

Queries in Prometheus

One of the useful features of Prometheus is its powerful query language for manipulating the time series data collected.

For example, the cumulative value of total number of http requests of the counter we implement is not of much use. Typically we would want to get the number of requests a server received per second to get an idea of the system load at various time of the day. This can be achieved by query like the following:

rate(http_requests_total{job="blog", handler="HomeController.handlerA"}[5m])

The above query uses the built-in rate() function to return per-second rate of the http requests measured over the last 5 minutes ([5m]) for the endpoint /endpointA, as specified by the handler filters handler=”HomeController.handlerA”.  The query can be modified to only gather requests that are not successful (not 2xx) by including the label status in the query:

rate(http_requests_total{job="blog", handler="HomeController.handlerA", status!~"^2..$"}[5m])

Prometheus include 2 labels names job and instance for the target as defined in the prometheus.yml configuration file. It is recommended to always include the job name in the query.

Similarly, we can create query to get the average response time over last 5 minutes with the following query

rate(http_response_time_milliseconds_sum{job="blog"}[5m])/rate(http_response_time_milliseconds_count{job="blog"}[5m])

For more details on operations and functions Prometheus support, see their documentation here

This blog post introduces with examples how to implement application specific metrics using Prometheus JVM client library as well as how to use query functions provided by Prometheus to filter and query the data collected. The ability for developers to implement their own metrics together with support with powerful query language by Prometheus is particularly useful when one needs to implement, monitor and analysis specific application functions beyond typical JVM and application metrics.

Advertisements