Dec 21st, 2020: [EN] When neither logging nor code tracing is enough: Log Correlation with APM

Application Performance Monitoring and logging both provide critical insight into your ecosystem. When paired together for context, they can provide vital clues on how to resolve problems with your applications. This post assumes you’re familiar with what an APM (also known as “code tracing”) system does, what log monitoring is, and the benefits of both. Elastic offers free solutions for both as part of Elastic Observability.

What is Log Correlation?

Log correlation is when log messages can be tied to a specific transaction which the APM system captured. It will help a lot when you explicitly need to add more context than is possible in the APM dashboards, or when you simply wish to filter down thousands of log lines to only those relevant to the transaction you’re investigating. This is what the end result looks like:

From the APM system showing two services talking to each other - an example of distributed tracing - we can easily jump over to ...

... Trace Logs where we can see more details about this transaction being logged by each of the two services.

Under the hood, the Python application is logging

{"x": 3, "y": 42, "transaction.id": "8c43b66920cc6b23", "trace.id": "dee3f7bfa66b20474f31547a7a3499dc", "message": "About to call another internal service, requesting info on surplus of goods 3, 42", "timestamp": "2020-12-15 05:52:25", "level": "info"}

{"transaction.id": "8c43b66920cc6b23", "trace.id": "dee3f7bfa66b20474f31547a7a3499dc", "message": "Surplus information obtained: {"3":"low stock","42":"20% surplus"}", "timestamp": "2020-12-15 05:52:30", "level": "info"}

And the Ruby application is logging:

{"timestamp":"2020-12-15 05:52:30 +0000\n","severity":"INFO\n","message":"Processing by StockAvailabilityController#index as /","trace.id":"dee3f7bfa66b20474f31547a7a3499dc","transaction.id":"6d3cf5ad34669451","span.id":null}

{"timestamp":"2020-12-15 05:52:30 +0000\n","severity":"INFO\n","message":" Parameters: {"x"=\u003e"3", "y"=\u003e"42"}","trace.id":"dee3f7bfa66b20474f31547a7a3499dc","transaction.id":"6d3cf5ad34669451","span.id":null}

[... etc ...]

The “magic” that allows filtering of the messages on the UI lies in the trace.id field. Notice how trace.id is the same for the Python and Ruby apps, but transaction.id differs. A “trace” is the entire system’s response to a request it has received. Each service within that system responds to the request as it flows through - that’s one “transaction”. If a request flows through multiple system components, the trace will be made of multiple transactions. So 8c43b66920cc6b23 is Python’s response while 6d3cf5ad34669451 is Ruby’s response, but they’re both related to attempt dee3f7bfa66b20474f31547a7a3499dc by the system to answer a request. As long as all services within a system (including JS front-ends) are instrumented with Elastic APM, then whichever service first receives a request that doesn’t have a Trace ID header will stamp it with one before passing it on to the next service.

If you need a little bit less flexibility than a full-blown log, but still want to enhance your APM data, you might find transaction labels very helpful.

Next we’ll dig into why these logs look like this and how to easily set up your own monitoring to do this.

Architecture of example setup

We have a Python service and a Ruby service, both instrumented with Elastic APM. Their logs are collected by Filebeat and sent directly to Elasticsearch. We can send log messages directly without parsing because the apps are set up in line with best logging practices to use structured logging - each log message is a JSON object which can be sent as-is to Elasticsearch.

You might find the related code and config repository very helpful. The example services are stripped back to the bare bones to help focus on the setup.

Python service

Our Python service is a simple single-file Flask app with one route. When a user hits the homepage, it calls the Ruby service. We’ll need to instrument it with Elastic APM and make it produce structured JSON-format logs for Filebeat to send to Elasticsearch.

Elastic APM

We add one dependency to requirements.txt:

elastic-apm[flask]==5.9.0

Then in the main controller file (in our case the only file: app.py) we import it and let it wrap around our app.

from elasticapm.contrib.flask import ElasticAPM
apm = ElasticAPM(app)

You’ll need to make a copy of config.py.example and rename it to config.py. That will hold your Elastic Cloud account secrets. You don’t want to ever publish this file, so it’s in .gitignore! SERVER_URL and SECRET_TOKEN refer to the APM server URL and secret token which you can get from your Elastic Cloud account:

Structured logging

After some necessary imports...

import structlog
from structlog import PrintLogger, wrap_logger
from structlog.processors import JSONRenderer
from elasticapm.handlers.structlog import structlog_processor as apm_structlog_processor

... we stumble across the first interesting part. structlog wants to name the field containing the raw text message event. Unfortunately, in Elastic Common Schema, that already means something quite different. ECS is an extremely useful set of naming conventions that you may well want to use with your app, so what we’re going to do is rename the event field to message.

def rename_event_key(logger, method_name, event_dict):

event_dict["message"] = event_dict.pop("event")

return event_dict

wrapped_logger = PrintLogger(file=f)

The magic of JSON events coming out ready-made for Elasticsearch happens here:

logger = wrap_logger(wrapped_logger, processors=[
  apm_structlog_processor,
  rename_event_key,
  structlog.processors.TimeStamper(fmt="%Y-%m-%d %H:%M:%S"),
  structlog.stdlib.add_log_level, structlog.processors.format_exc_info, JSONRenderer()
])

log = logger.new()

apm_structlog_processor adds the trace.id and transaction.id details we discussed at the start of this post. The others perform a few basic administrative functions like timestamping and converting the output to JSON. You can write and add your own log processor in a few lines, as we’ve done above with rename_event_key.

That’s it! The Python service is now instrumented with Elastic APM and it can output JSON logs.

To run it, install Python on your machine and create a virtualenv

cd <path to this blog’s example repository/logs-traces/supermarket
virtualenv pyenv
. pyenv/bin/activate
pip install -r requirements.txt
flask run

Ruby Service

To start, we need to add

gem 'elastic-apm'

to the Gemfile.

We’ll need to take care of our Elastic Cloud credentials. Go to config/ and make a copy of elastic_apm.yml.example. Call it elastic_apm.yml and fill in the server_url and secret_token in the same way we did for Python, from Elastic Cloud.

We can configure our logger in config/application.rb:

config.logger = ActiveSupport::Logger.new(File.join(Rails.root, 'log', "supermarketwarehouse.log"))
config.logger.datetime_format = "%Y-%m-%d %H:%M:%S"

class MyAppFormatter < Logger::Formatter
  def call(severity, datetime, progname, msg)
    ElasticAPM.log_ids do |transaction_id, span_id, trace_id|
      {
        timestamp: "#{datetime}\n",
        severity: "#{severity}\n",
        message: "#{msg}",
        :'trace.id' => trace_id,
        :'transaction.id' => transaction_id,
        :'span.id' => span_id
      }.to_json + "\r\n"
    end
  end
end

config.logger.formatter = MyAppFormatter.new

This is slightly more verbose than Python - we have to ask ElasticAPM explicitly for the transaction.id and trace.id but the difference is negligible.

There are quite a few Rails logging libraries and just as many methods to achieve the same output. Many of them support the basic Rails logger and you can use this snippet without any external libraries, the only requirement is Rails itself.

That’s it! Elastic APM instrumentation is a few lines and the structured logs are easily produced in Rails.

To run it, install rbenv and

cd <path to this blog’s example repository>/logs-traces/rbapp
# this should tell you you’re missing the required version of Ruby - 2.7.1 - and prompt you to install it. Do that and then:
cd supermarketwarehouse/
gem install bundler
bundle install
rails s

Filebeat

Now, we have to send those neat structured logs up to Elasticsearch so they will show up when we follow the “Trace Logs” link on the APM UI.

This is the filebeat.yml config you’ll need and here are the interesting parts. You’ll need to edit in the path to wherever you cloned this blog’s example repository:

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - <path to example repo>/logs-traces/supermarket/supermarket.log
    - <path to example repo>/logs-traces/rbapp/supermarketwarehouse/log/supermarketwarehouse.log

  json:
    keys_under_root: true
    overwrite_keys: true
    message_key: 'message'

keys_under_root prevents all our data from being put under one json key, so we can access it in Kibana as trace.id rather than json.trace.id. overwrite_keys controls whether keys that already exist in the event are overwritten by keys from the decoded JSON object. Since we’re only sending JSON objects and they’re our definitive source of truth for logging events, set it to true. Finally, message_key tells Filebeat where the raw event string is contained.

You’ll want to also set

cloud.id: taken from Elastic Cloud, the “home page” of your deployment

cloud.auth: "elastic:[your Cloud password, the one it only shows you once when creating the cluster but that you can reset anytime]"

To try this quickly on your local machine, download the appropriate archive version of filebeat for your operating system (the .tar.gz rather than full service package like .deb) and run it as root.

sudo chown root filebeat.yml
sudo chown root modules.d/system.yml
sudo ./filebeat -e

This is just for testing it out - for production, follow the appropriate instructions for your OS and run it as a service! There is also guidance on running it in a Kubernetes environment (under “Kubernetes Logs”).

Once you have Filebeat running, everything should be set up! Access http://localhost:5000 . It will take a few seconds to call the Ruby app and you should see the response “Single-file Supermarket Management System”. Within 20 seconds you should be able to see the transaction in the APM UI in Kibana, and be able to click Actions -> Trace Logs (as in the screenshots at the start of this post) to see the correlated messages appear!

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.