Bring Elastic Observability to your Java application

How to plan and implement observability of a Java application with Elasticsearch — a step-by-step guide

boy in grass field with binoculars search

The concept of observability has been around for decades, but it’s a relative newcomer to the world of IT infrastructure. So what is observability in this context? It’s the state of having all of the information about the internals of a system so when an issue occurs you can pinpoint the problem and take the right action to resolve it.

Notice that I said state. Observability is not a tool or a set of tools — it’s a property of the system that we are managing. In this article, I will walk through how to plan and implement an observable deployment including API testing and the collection of logs, metrics, and application performance monitoring (APM) data. I’ll also direct you to a number of free, self-paced training courses that help you develop the skills needed for achieving observable systems with the Elastic Stack.

Three steps to observability

These are the three steps toward observability presented in this article:

Plan for success

Collect requirements
Identify data sources and integrations

Deploy Elasticsearch and Kibana
Collect data from systems and your services

Logs
Metrics
Application performance management
API synthetic testing

Plan for success

I have been doing fault and performance management for the past twenty years. In my experience, to reliably reach a state of observability, you have to do your homework before getting started. Here’s a condensed list of a few steps I take to set up my deployments for success:

Goals: Talk to everyone and write the goals down

Talk to your stakeholders and identify the goals: “We will know if the user is having a good or bad experience using our service;” “The solution will improve root cause analysis by providing distributed traces;” “When you page me in the middle of the night you will give me the info I need to find the problem;” etc.

Data: Make a list of what data you need and who has it

Make a list of the necessary information (data and metadata) needed to support the goals. Think beyond IT information — include whatever data you need to understand what is happening. For example, if Ops is checking the Weather Channel during their workflow, then consider adding weather data to your list of required information. Snoop around the best problem solver’s desk and find out what they’re looking at during an outage (and how they like their coffee). If your organization does postmortems, take a look at the data that the people bring into the room; if it’s valuable to determine the root cause at a finger-pointing session, then it’s so much more valuable in Ops before an outage.

Fix: Think about the solution and information that can speed it up

If Ops needs a hostname, a runbook, some asset info, and a process name to fix the problem, then have that data available in your observability solution and send it over when you page them. Add the required bits of information to the list you started in the previous step.

A good starting point

At this point, you have a list of data that you need so that when an issue occurs you can pinpoint the problem and take the right action to resolve it. That list might look something like this:

Service data

User experience data for my service
- Response time of the application per transaction and the components that make up the application (e.g., the front end and the database)
- Proper API functionality via synthetic testing
Performance data for my infrastructure
- Operating system metrics
- Database metrics
Logs from servers and apps

Inbound integrations

History of past incidents
Runbooks
Asset info
Weather or other “non-IT” data

Outbound integrations

Incident management integration for alerting

Elastic Observability

The Elastic Stack — Elasticsearch, Kibana, Beats, and Logstash; formerly known as the ELK Stack — is a set of powerful open source tools for searching, analyzing, and visualizing data in real time. The Elastic Stack is widely used to centralize logs from operational systems. Over time, Elastic has added products for metrics, APM, and uptime monitoring — this is the Elastic Observability solution.

The value of Elastic Observability is that it brings together all the types of data you need to help you make the right operational decisions and achieve a state of observability. Let’s jump into a scenario to demonstrate how to put Elastic Observability into action.

Scenario

I have a simple application to manage. It consists of a Spring Boot application running on a Linux VM in Google Cloud Platform. The application exposes two API endpoints and has a MariaDB back end. You can find the application in the Spring Guides. I have created an Elasticsearch Service deployment in Elastic Cloud and I will follow the agent install tutorials right in Kibana, the Elasticsearch analysis and management UI. The open source agents that will be used are:

Filebeat for logs
Metricbeat for metrics
Heartbeat for API testing and response time monitoring
Elastic APM Java Agent for distributed tracing of the application

Note: This guide is written for a specific application based on Spring Boot and MySQL. If you have something else that you want to collect logs, metrics, and APM traces from, then you should be able to modify these instructions to do what you want. When you open up Kibana you will be greeted with a long list of out-of-the-box observability integrations.

Implementation

In this article I will go over the steps to get the basics done, and then in future articles I’ll dive into best practices and some of the integrations. Let’s walk through a simple deployment.

Hosted Elasticsearch Service

To follow along in this guide, create a deployment in Elasticsearch Service on Elastic Cloud (a trial account is free). Once you sign up, watch and follow the steps in the Deploy Elasticsearch in 3 minutes or less video. A few minutes later you will have a cluster that you can use to follow along with the rest of this article. Download the password that is presented to you; you will use that to log in to Kibana and to configure the Beats. The screenshots are from version 7.8 of the Elastic Stack — your UI may look slightly different based on your version.

Kibana

Kibana is the visualization and management tool of the Elastic Stack. Kibana will guide us through installing and configuring the Beats and Elastic APM Java Agent.

Launch Kibana from the deployment details and log in with the elastic username and password. If you forget the password, reset it and then open Kibana:

Find your way home

The instructions for everything that you need to install can be found right in your Kibana instance. Often over the next few pages I will direct you to Kibana Home; you can get there by clicking on the Elastic icon in the top left of any Kibana page.

If you want to dock the navigation menu while you learn your way around Kibana, click on the top-left three-line icon and then the lock at the bottom left:

Add integrations

This is the list of what will be collected:

Logs from the infrastructure and MariaDB
Metrics from the infrastructure and MariaDB
API test results and response time measurements
Distributed tracing of the application including the database

Kibana guides you through adding logs, metrics, and APM. This video shows how to add MySQL metrics, and once you know how to do that you can follow the same process to add log and APM data.

Logs from the infrastructure and MariaDB

Both MariaDB and MySQL provide logs. I am interested in the error log and the slow log. By default the slow log is not produced. To configure these logs, have a look in the MariaDB docs. For my deployment the configuration file is /etc/mysql/mariadb.conf.d/50-server.cnf. Here are the relevant parts:

# This group is only read by MariaDB servers, not by MySQL.
# If you use the same .cnf file for MySQL and MariaDB,
# you can put MariaDB-only options here
[mariadb]
slow_query_log

#
# * Logging and Replication
#
# Both location gets rotated by the cronjob.
# Be aware that this log type is a performance killer.
# As of 5.1 you can enable the log at runtime!
#general_log_file        = /var/log/mysql/mysql.log
#general_log             = 1
#
# Error log - should be very few entries.
#
log_error = /var/log/mysql/error.log
#
# Enable the slow query log to see queries with especially long duration
slow_query_log_file = /var/log/mysql/mariadb-slow.log
long_query_time = 0.5
log_slow_rate_limit = 1
log_slow_verbosity = query_plan
#log-queries-not-using-indexes

To enable the slow query log, uncomment the lines in the slow query section and adjust the long query time as desired (the default is 10 seconds).

A quick test of the configuration is to force a slow query with a SELECT SLEEP():

$ sudo -- sh -c 'echo "select sleep(2);" | mysql'

sleep(2)
0

This results in a record being added to the slow log:

# Time: 200427 15:19:59
# User@Host: root[root] @ localhost []
# Thread_id: 13  Schema:   QC_hit: No
# Query_time: 2.000173  Lock_time: 0.000000  Rows_sent: 1  Rows_examined: 0
# Rows_affected: 0
SET timestamp=1588000799;
select sleep(2);

Install Filebeat

Follow the directions in Kibana Home > Add log data > MySQL logs. When you are instructed to enable and configure the mysql module, refer to these details for additional information:

The filebeat modules enable command takes a list of modules, so save some steps and add system and auditd to the list:
```
sudo filebeat modules enable mysql system auditd
```
When you are instructed to Modify the settings in the d/mysql.yml file, note that the slow log I added is not in the default location, so edit the file modules.d/mysql.yml and specify the location of the slow log as an entry in the var.paths array:

- module: mysql
  # Error logs
  error:
    enabled: true

    # Set custom paths for the log files. If left empty,
    # Filebeat will choose the paths depending on your OS.
    #var.paths:

  # Slow logs
  slowlog:
    enabled: true

    # Set custom paths for the log files. If left empty,
    # Filebeat will choose the paths depending on your OS.
    var.paths:
            - /var/log/mysql/mariadb-slow.log

Run the setup command and start Filebeat as directed in Kibana > Add log data > MySQL logs. At the bottom of that page is a link to the MySQL dashboard. You should also look at the [Filebeat System] Syslog dashboard ECS and [Filebeat System] Sudo commands ECS dashboards. You can search for these in the dashboard list:

Metrics from the infrastructure and MariaDB

The operating system and MariaDB both expose metrics. There is no configuration for the OS to expose metrics. MariaDB makes metrics available at port 3306 by default, and the connection is password protected when you add a password to MariaDB.

Install Metricbeat

Follow the directions in Kibana Home > Add metric data > MySQL metrics. When you are instructed to enable and configure the mysql module, refer to these details for additional information:

The metricbeat modules enable command takes a list of modules, so save some steps and add system to the list:
```
sudo metricbeat modules enable mysql system
```
When you are instructed to Modify the settings in the d/mysql.yml file, refer to these details: The Metricbeat module for MySQL needs to be configured with the proper host or IPADDR, port, username, and password. Here is my /etc/metricbeat/modules.d/mysql.yml:

- module: mysql
  #metricsets:
  #  - status
  #  - galera_status
  period: 10s

  # Host DSN should be defined as "user:pass@tcp(127.0.0.1:3306)/"
  hosts: ["springuser:ThePassword@tcp(roscigno-obs:3306)/"]

  # I copied the username and password used in the Spring Book guide
  # my hostname is roscigno-obs

Run the setup command and start Metricbeat as directed in Kibana > Add metric data > MySQL metrics. At the bottom of that page is a link to the MySQL dashboard:

You should also look at the Metricbeat system dashboards. You can search for these in the dashboard list:

API test results and response time measurements

In order to measure proper functionality of the API endpoints we need to POST some URL encoded data, read the response, and verify it. This is often done manually by using curl or the Postman API Client. By automating the testing with Heartbeat, the response time and test results are available alongside the logs, APM, and other metrics for the service. Heartbeat monitors the availability of services by testing API endpoints for proper responses, checking websites for content and response codes, verifying ICMP pings, etc.

Install Heartbeat

Follow the instructions in Kibana Home > Add metric data > Uptime monitors. When you are instructed to edit the heartbeat.monitors setting in the heartbeat.yml file, replace the existing monitor with this API test:

# Configure monitors inline
heartbeat.monitors:
- type: http
  name: SpringToDoApp
  schedule: '@every 5s'
  urls: ["http://roscigno-obs:8080/demo/add"]
  check.request:
    method: POST
    headers:
      'Content-Type': 'application/x-www-form-urlencoded'
    body: "name=first&email=someemail%40someemailprovider.com"
  check.response:
    status: 200
    body:
      - Saved
      - saved
  response.include_body: 'always'

Run the setup command and start Heartbeat as directed in Kibana > Add metric data > Uptime monitors. At the bottom of that page is a link to the Uptime App.

And

Distributed tracing of the application including the database

Elastic APM instruments your applications to ship performance metrics to Elasticsearch for visualization in Kibana with the APM app. By adding the APM jar file to the command used to launch the application I get distributed tracing so I can see where my app is spending time (whether it is in the Java code or in the calls to MariaDB).

The process is provided in Kibana Home > Add APM > Java and consists of downloading the jar file and using the Java instrumentation API to start the agent. Open the Home > Add data > APM page, choose Java, and download the jar file:

Load the APM Kibana objects

I prefer to use environment variables for the javaagent properties, so I take the details provided and set the environment variables:

$ cat environment
export 
ELASTIC_APM_SERVER_URL=https://1530f7c8afdf402eb281750f0b127bc4.apm.us-central1.gcp.cloud.es.io:443

export ELASTIC_APM_SECRET_TOKEN=WjyW67R0eSWDhILWDD

export ELASTIC_APM_SERVICE_NAME=winter-mysql

export ELASTIC_APM_APP_PACKAGES=com.example

I am launching the app via ./mvnw spring-boot:run and sourcing the environment variables in the Maven Wrapper:

exec "$JAVACMD" 
  -javaagent:./elastic-apm-agent.jar 
  -Delastic.apm.service_name=${ELASTIC_APM_SERVICE_NAME:-demo-app} 
  -Delastic.apm.server_url=${ELASTIC_APM_SERVER_URL} 
  -Delastic.apm.secret_token=${ELASTIC_APM_SECRET_TOKEN} 
  -Delastic.apm.application_packages=org.example 
  $MAVEN_OPTS 
  -classpath "$MAVEN_PROJECTBASEDIR/.mvn/wrapper/maven-wrapper.jar" 
  "-Dmaven.home=${M2_HOME}" 
  "-Dmaven.multiModuleProjectDirectory=${MAVEN_PROJECTBASEDIR}" 
  ${WRAPPER_LAUNCHER} "$@"

As soon as the application is started, the API tests set up earlier with Heartbeat will result in traces in Elasticsearch. Launch APM and navigate to your service:

Out of the box and without any code changes you will see detailed performance information on response time for incoming requests, database queries, calls to caches, external HTTP requests, JVM metrics, errors, and more. This makes it easy to pinpoint and fix performance problems quickly. Looking into the AddNewUser transactions (which is where Heartbeat is performing synthetic API testing) I can see the breakdown of a transaction:

And digging into the “INSERT INTO user” span I can see the SQL used, transaction ID, etc.:

There is so much more to do with APM, in this article we are just looking at the out of the box traces with zero code changes. To add custom spans and more, see the documentation.

Elastic Common Schema

The Elastic Common Schema is a powerful way to correlate information from multiple sources. It specifies a common set of fields for logs and metrics that is used by all of the Elastic Beats, and it is extendable by you. By adopting the Elastic Common Schema in the Spring Boot app, the logs are both clear and consistent with the other data sources we are ingesting. This helps correlate a record from the application log with a database metric and an APM trace. You can read about logging best practices for Java apps and watch a video showing a Spring Boot application getting a logging overhaul.

Put it all together

In this article I covered several ways of collecting data from and about your application and the infrastructure. You can do the same and see logs, metrics, synthetic testing of your APIs, APM traces, and more. Earlier I said that you should not limit yourself in the type and amount of data that you bring in; decide what will help you prevent downtime and expand your list of sources. In future articles I will go into topics like alerting, machine learning, and bringing in custom data sources.

If you want to discuss what I have shown, head to discuss.elastic.co and start a conversation. Pop @DanRoscigno in the post and I will try to help.

More things to consider

I skipped over “When you page me in the middle of the night you will give me the info I need to find the problem” from the goals, and a few other things that should be part of every project. In a future article I will write about building service level dashboards, best practices for ingest with Beats, alerting, and real user monitoring. In the meantime, see the links.

If you’re ready to try it out yourself, the easiest way to get started is to take advantage of the free 14-day trial of the Elasticsearch Service on Elastic Cloud — the official hosted Elasticsearch offering from Elastic, which includes Kibana. If you prefer, you can also download Elasticsearch and Kibana to run on your laptop or deploy in a data center. Free hands-on training to build your skills is available at elastic.co/training/free. There is a group of observability related courses listed, and I would also suggest the Kibana course on that page as it walks you through creating your own visualizations.

Dan Roscigno is a customer success manager focused on bringing decades of operations experience to customers in webinars, videos, tutorials, and documentation. He has been an EMT, sysadmin, developer, tech lead in operations, consultant, learning content developer, and a product marketing manager. He loves to make ops less stressful. Contact him on Twitter or discuss.elastic.co: @DanRoscigno.

—

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.

Software DevelopmentJava