Monitoring Spring Boot Applications with Prometheus – Part 2

In my previous post, I describe how to use Prometheus and its JVM client library in a Spring Boot application to gather common JVM metrics. In this blog, I will demonstrate how to write your own application specific metrics using the client library. Prometheus client libraries support 4 core metric types: Counter, Gauge, Histogram and Summary. Its documentation has a brief but clear explanation of each metric types.

I will create 2 metrics to gather statistics about incoming requests to a Spring web application. In particular, a counter to count how many requests the web application handles and a summary for measuring the processing time of the incoming requests.

Application Set Up

To demonstrate the metrics we are going to implement, I have the following controller implementing two endpoints /endpointA and /endpointB. They do nothing except wasting a random amount of time between 0 and 100 ms.

@RestController
public class HomeController {
     private Logger logger = LoggerFactory.getLogger(getClass());
 
     @RequestMapping("/endpointA")
     public void handlerA() throws InterruptedException {
          logger.info("/endpointA");
          Thread.sleep(RandomUtils.nextLong(0, 100));
     }
 
     @RequestMapping("/endpointB")
     public void handlerB() throws InterruptedException {
          logger.info("/endpointB");
          Thread.sleep(RandomUtils.nextLong(0, 100));
     }
}

Counter Metric

A counter is a metric of numeric value that never goes down. Here we add a request handler to keep track of the cumulative number of requests the web application receives:

import io.prometheus.client.Counter;
..
public class RequestCounterInterceptor extends HandlerInterceptorAdapter {

     // @formatter:off
     // Note (1)
     private static final Counter requestTotal = Counter.build()
          .name("http_requests_total")
          .labelNames("method", "handler", "status")
          .help("Http Request Total").register();
     // @formatter:on

     @Override
     public void afterCompletion(HttpServletRequest request, HttpServletResponse response, Object handler, Exception e)
 throws Exception {
          // Update counters
          String handlerLabel = handler.toString();
          // get short form of handler method name
          if (handler instanceof HandlerMethod) {
               Method method = ((HandlerMethod) handler).getMethod();
               handlerLabel = method.getDeclaringClass().getSimpleName() + "." + method.getName();
          }
          // Note (2)
          requestTotal.labels(request.getMethod(), handlerLabel, Integer.toString(response.getStatus())).inc();
     }
}

Note:

  1. We implement a counter using io.prometheus.client.Counter class to keep track of number of incoming requests handled by this handler. The counter is named http_requests_total and consists of a number of labels (method, handler and status). A label is an attribute of a metric which can be used in query to filter and aggregate metrics.
  2. The counter is incremented using the Counter’s inc() method. The values of the metric labels method, handler and status are populated with the request http method (get/post), the spring mvc controller and method, and response http status respectively.

Summary Metric

A summary is a complex metric which track a number of observations as well as their counts. See here for a comprehensive explanation of summary and histogram metrics. Similar to previous section, we implement the metric within a request handler class:

import io.prometheus.client.Summary;
...
public class RequestTimingInterceptor extends HandlerInterceptorAdapter {

      private static final String REQ_PARAM_TIMING = "timing";

      // @formatter:off
      // Note (1)
      private static final Summary responseTimeInMs = Summary
           .build()
           .name("http_response_time_milliseconds")
           .labelNames("method", "handler", "status")
           .help("Request completed time in milliseconds")
           .register();
      // @formatter:on

      @Override
      public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) throws Exception {
           // Note (2)
           request.setAttribute(REQ_PARAM_TIMING, System.currentTimeMillis());
           return true;
      }

      @Override
      public void afterCompletion(HttpServletRequest request, HttpServletResponse response, Object handler, Exception ex)
 throws Exception {
           Long timingAttr = (Long) request.getAttribute(REQ_PARAM_TIMING);
           long completedTime = System.currentTimeMillis() - timingAttr;
           String handlerLabel = handler.toString();
           // get short form of handler method name
           if (handler instanceof HandlerMethod) {
                Method method = ((HandlerMethod) handler).getMethod();
                handlerLabel = method.getDeclaringClass().getSimpleName() + "." + method.getName();
           }
         // Note (3)
         responseTimeInMs.labels(request.getMethod(), handlerLabel, Integer.toString(response.getStatus())).observe(
 completedTime);
      }
}

Note:

  1. We implement a metric with class io.prometheus.client.Summary. The same set of labels as used in the request counter is used here.
  2. The response time is measured here as time lapsed between the calls to preHandle and afterCompletion methods. This is just for illustration only so I can demonstrate the use of Prometheus client library.
  3. The summary metric is updated by calling the observe() method with the response time value.

Collecting Metrics

Now that we implement the request handler interceptors, we can register them to the test endpoints and start up the Spring Boot web application. Hit the two endpoints using the browser and then go to the Prometheus url, e.g. http://localhost:8080/prometheus, should return something similar to the following:

...
# HELP http_response_time_milliseconds Request completed time in milliseconds
# TYPE http_response_time_milliseconds summary
http_response_time_milliseconds_count{method="GET",handler="HomeController.handlerA",status="200",} 3.0
http_response_time_milliseconds_sum{method="GET",handler="HomeController.handlerA",status="200",} 169.0
http_response_time_milliseconds_count{method="GET",handler="HomeController.handlerB",status="200",} 1.0
http_response_time_milliseconds_sum{method="GET",handler="HomeController.handlerB",status="200",} 59.0
# HELP http_requests_total Http Request Total
# TYPE http_requests_total counter
http_requests_total{method="GET",handler="HomeController.handlerA",status="200",} 3.0
http_requests_total{method="GET",handler="HomeController.handlerB",status="200",} 1.0

Note the summary metric http_response_time_milliseconds actually collects two time series data: _count and _sum. In practice, we don’t require a separate counter metric. It is more to demonstrate how to implement a simple counter with the Prometheus client library.

Also, data for each metric are collected separately for each distinct set of label values. In this example, we have metrics for each individual endpoints given the label values for the label handler is different. Since we also include the response status as a label, we will potentially get other metrics with non 200 status value, for example

http_response_time_milliseconds_count{method="GET",handler="HomeController.handlerA",status="500",} 3.0

Queries in Prometheus

One of the useful features of Prometheus is its powerful query language for manipulating the time series data collected.

For example, the cumulative value of total number of http requests of the counter we implement is not of much use. Typically we would want to get the number of requests a server received per second to get an idea of the system load at various time of the day. This can be achieved by query like the following:

rate(http_requests_total{job="blog", handler="HomeController.handlerA"}[5m])

The above query uses the built-in rate() function to return per-second rate of the http requests measured over the last 5 minutes ([5m]) for the endpoint /endpointA, as specified by the handler filters handler=”HomeController.handlerA”.  The query can be modified to only gather requests that are not successful (not 2xx) by including the label status in the query:

rate(http_requests_total{job="blog", handler="HomeController.handlerA", status!~"^2..$"}[5m])

Prometheus include 2 labels names job and instance for the target as defined in the prometheus.yml configuration file. It is recommended to always include the job name in the query.

Similarly, we can create query to get the average response time over last 5 minutes with the following query

rate(http_response_time_milliseconds_sum{job="blog"}[5m])/rate(http_response_time_milliseconds_count{job="blog"}[5m])

For more details on operations and functions Prometheus support, see their documentation here

This blog post introduces with examples how to implement application specific metrics using Prometheus JVM client library as well as how to use query functions provided by Prometheus to filter and query the data collected. The ability for developers to implement their own metrics together with support with powerful query language by Prometheus is particularly useful when one needs to implement, monitor and analysis specific application functions beyond typical JVM and application metrics.

Monitoring Spring Boot Applications with Prometheus – Part 1

This blog post will demonstrate how to use Prometheus to monitor a spring boot web application. Prometheus is an open source tool for monitoring systems by collecting metrics from target systems as time series data. It supports multiple approaches for instrumenting the application codes. I am going to show how to do this using the Prometheus JVM client library.

Instrumenting with Prometheus JVM client

POM setup

I set up a Spring Boot project in Maven and include the following dependency for the Prometheus JVM client (version 0.0.16):

 <!-- Hotspot JVM metrics -->
 <dependency>
      <groupId>io.prometheus</groupId>
      <artifactId>simpleclient_hotspot</artifactId>
      <version>${prometheus.version}</version>
 </dependency>
 <!-- Exposition servlet -->
 <dependency>
      <groupId>io.prometheus</groupId>
      <artifactId>simpleclient_servlet</artifactId>
      <version>${prometheus.version}</version>
 </dependency>
 <!-- The client -->
 <dependency>
      <groupId>io.prometheus</groupId>
      <artifactId>simpleclient</artifactId>
      <version>${prometheus.version}</version>
 </dependency>

Configure and implement Metric endpoint

The main method for Prometheus to collect metrics is via scraping an endpoint implemented by the target application on regular intervals. To do that, include a Java configuration class as follows:

@Configuration
@ConditionalOnClass(CollectorRegistry.class)
public class PrometheusConfiguration {

     @Bean
     @ConditionalOnMissingBean
     CollectorRegistry metricRegistry() {
         return CollectorRegistry.defaultRegistry;
     }

     @Bean
     ServletRegistrationBean registerPrometheusExporterServlet(CollectorRegistry metricRegistry) {
           return new ServletRegistrationBean(new MetricsServlet(metricRegistry), "/prometheus");
     }

...
}

The above code adds the endpoint (/prometheus) to the Spring Boot application. Now we are ready to add some metrics to it. The Prometheus JVM client includes a number of standard exporters to collect common JVM metrics such as memory and cpu usages. Let’s add them to our new prometheus endpoint

First, we create a exporter register class

/**
 * Metric exporter register bean to register a list of exporters to the default
 * registry
 */
public class ExporterRegister {

     private List<Collector> collectors; 

     public ExporterRegister(List<Collector> collectors) {
          for (Collector collector : collectors) {
              collector.register();
          }
          this.collectors = collectors;
     }

     public List<Collector> getCollectors() {
          return collectors;
     }

}

The above class is just a utility class to register a collection of metric collectors with the registry. Now add the standard exporters from Prometheus JVM client:

import io.prometheus.client.hotspot.MemoryPoolsExports;
import io.prometheus.client.hotspot.StandardExports;
...  
     @Bean
     ExporterRegister exporterRegister() {
           List<Collector> collectors = new ArrayList<>();
           collectors.add(new StandardExports());
           collectors.add(new MemoryPoolsExports());
           ExporterRegister register = new ExporterRegister(collectors);
           return register;
      }

We just added 2 exporters: (1) StandardExports provides CPU usage metrics and (2) MemoryPoolExports add memory usage by the JVM and host. To see what metrics are now available, go to the URL in the browser:

http://localhost:8080/prometheus

The browser should display something like below (truncated as it is too long to list)

# HELP jvm_up_time_seconds System uptime in seconds.
# TYPE jvm_up_time_seconds gauge
jvm_up_time_seconds 15.0
# HELP jvm_cpu_load_percentage JVM CPU Usage %.
# TYPE jvm_cpu_load_percentage gauge
jvm_cpu_load_percentage 37.18078068931383
# HELP os_cpu_load_percentage System CPU Usage %.
# TYPE os_cpu_load_percentage gauge
.
.
.

Install and Setup Prometheus

Now we have implemented the metric endpoint for the Spring Boot application, we are ready to install and configure Prometheus. Following the instruction here to install Prometheus and start up the server. You should now start up and access the server in your browser, e.g. http://localhost:9090/targets

blog_prom_1

By default, Prometheus is configured to monitor itself, handy. Now let’s update the configuration to scrape our Spring Boot app. Open the file prometheus.yml in the Prometheus folder and add the following lines under the scrape_configs section:

 - job_name: 'blog'

scrape_interval: 5s

 metrics_path: '/prometheus'
 static_configs:
 - targets: ['localhost:8080']

Restart Prometheus and refresh your browser to show the following:

blog_prom_2

Prometheus provides a rather basic graphing function. I will show how to integrate Prometheus with other graphing software in a later post. For now, let’s try to display memory usages of the Spring Boot application. Go to Graph tab and select from the dropdown the metric jvm_memory_bytes_used and click Execute buttonThis should end up like the following in the browser. Note the metric has two different set of time series data, head and nonheap usage.

blog_prom_3.png

Summary and What Next

In this blog, I describe how to add monitoring by Prometheus via its JVM client in a Spring Boot application, generate some JVM metrics using the provided exporter classes  and how to configure Prometheus to scrape the data. I end by demonstrating how to use the metrics with the graphing features in Prometheus.

In future blog, I will show how to implement custom metrics in the Spring Boot application using Prometheus JVM client as well as using its expression language to query time series data to return metrics relevant for monitoring purpose. I will also demonstrate how to create dashboard in Grafana using data from Prometheus.

 

Using Netflix servo to monitor Java applications

Netflix servo is a lightweight API for exposing and publishing application metrics. This blog will demonstrate step by step how to use servo in a Java application.

Project set up

I use Maven so the first step is to include the following in the pom file:

 <dependency>
      <groupId>com.netflix.servo</groupId>
      <artifactId>servo-core</artifactId>
      <version>0.8.0</version>
 </dependency>

Adding Metrics to JMX

Adding metrics with servo is simple. For example, the following codes add metrics to a Rest controller for a single end point (/sayhi):

@RestController("testController")
public class TestController {

 @Monitor(name = "requestCounter", type = DataSourceType.COUNTER, description = "Total number of requests", level = DataSourceLevel.INFO)
 private final AtomicInteger requestCounter = new AtomicInteger(0);
 
 @Monitor(name = "aGauge", type = DataSourceType.GAUGE, description = "A random gauge", level = DataSourceLevel.CRITICAL)
 private final AtomicInteger aGauge = new AtomicInteger(0);
 
 @MonitorTags
 private final TagList tags = BasicTagList.of("id", "testController", "class", "au.com.dac.controller.TestController");
 
 @PostConstruct
 public void init() {
      Monitors.registerObject("testController", this);
 }
 
 @RequestMapping(value = "/sayhi", method = RequestMethod.GET )
 public String sayHi(@RequestParam String to) {
      requestCounter.incrementAndGet(); // increment counter
      aGauge.set(RandomUtils.nextInt(0, 100)); // set some random value
      return "hi " + to;
 }
 
}

Two metrics are defined in this class using the @Monitor annotation. There are 3 types of monitors: counter, gauge and informational. Note also the user of @MonitorTags annotation. This is used to add a set of tags as key-value pairs to all the annotated fields in the class. In the example above, two key-value pairs with key “id” and “class” and values “testController” and “au.com.dac.controller.TestController” respectively.

The object needs to be registered with a monitor registry for it to be monitored. This is done with the following line:

Monitors.registerObject("testController", this);

The registerObject method of the class Monitors use the default monitor registry class DefaultMonitorRegistery to register an object. By default, a JMXMonitorRegistry is used.

To update the monitors, use their corresponding API methods:

 requestCounter.incrementAndGet(); // increment counter
 aGauge.set(RandomUtils.nextInt(0, 100)); // set some random value

Now the metrics are available in JMX and can be viewed via JConsole or VisualVM.

Publishing to other sources

Servo provides a simple and yet powerful API for collecting and publishing metrics to other sources. This makes it easy to add monitors to a Java application and expose them to external monitoring system such as Stackdriver and AWS CloudWatch. The following method setup a metrics poller to collect all metrics regularly (once a minute) and publish to the included metric observers.

 private void initMetricsPublishing() {
      PollScheduler scheduler = PollScheduler.getInstance();
      scheduler.start();
      MetricObserver logObserver = new LoggerMetricObserver("logger-observer"); 
      MetricObserver logObserverRate = new LoggerMetricObserver("logger-observer-rate");
      MetricObserver transform = new CounterToRateMetricTransform(logObserverRate, 1, TimeUnit.MINUTES);
      PollRunnable task = new PollRunnable(
           new MonitorRegistryMetricPoller(),
           BasicMetricFilter.MATCH_ALL,
           logObserver, transform);
      scheduler.addPoller(task, 1, TimeUnit.MINUTES);
 }

Note the poller is associated with two MetricObserver instances logObserver and logObserverRate. The later is wrapped by a CounterToRateMetricTransformer which is used to transform any counter values in the observer into rates.

MetricObserver

This is the interface servo uses to publish metrics to. Servo provides a few example implementations, include one for AWS CloudWatch. The following implements a metric observer that logs some metric information

public class LoggerMetricObserver extends BaseMetricObserver {
 private Logger logger = LoggerFactory.getLogger(getClass());

 public LoggerMetricObserver(String name) {
      super(name);
 }
 @Override
 public void updateImpl(List<Metric> metrics) {
      Preconditions.checkNotNull(metrics, "metrics");
      try {
         for (Metric m : metrics) {
              logger.info("{}: name[{}] tags[{}] value[{}]", new LocalDateTime(m.getTimestamp()), m.getConfig()
 .getName(), m.getConfig().getTags(), m.getValue());
         }
       } catch (Throwable t) {
              incrementFailedCount();
              throw Throwables.propagate(t);
       }
 }
}

Running the Application

The complete source codes can be found in here. Running the web application (embedded Tomcat) and sending requests to http://localhost:8080/sayhi?to=Raymond with the browsers should display on the console something like the below every minute:

2015-08-04 23:48:10.119 INFO 952 --- [PollScheduler-3] a.c.m.b.servo.LoggerMetricObserver : 2015-08-04T23:48:10.119: name[requestCounter] tags[class=TestController,COUNTER,id=testController,INFO] value[3]
2015-08-04 23:48:10.119 INFO 952 --- [PollScheduler-3] a.c.m.b.servo.LoggerMetricObserver : 2015-08-04T23:48:10.119: name[aGauge] tags[class=TestController,GAUGE,id=testController,CRITICAL] value[51]
2015-08-04 23:48:10.120 INFO 952 --- [PollScheduler-3] a.c.m.b.servo.LoggerMetricObserver : 2015-08-04T23:48:10.119: name[requestCounter] tags[class=TestController,RATE,id=testController,INFO] value[0.04987697014032054]
2015-08-04 23:48:10.120 INFO 952 --- [PollScheduler-3] a.c.m.b.servo.LoggerMetricObserver : 2015-08-04T23:48:10.119: name[aGauge] tags[class=TestController,GAUGE,id=testController,CRITICAL] value[51]

Note the log messages for the 2 metric observers.