Dec 4th 2022: [EN] Ingesting JSON logs with Elastic-Agent (and/or Filebeat)

This article is also available in Portuguese.

Structured logging has almost became a standard in the industry, allowing for easy understanding and parsing of the logs. While the Elastic-Agent provides integrations to a number of applications, allowing to easily ingest and parse logs, when it comes to our own applications, a bit of work needs to be done.

Elastic-Agent x Filebeat

  • Elastic-Agent is a single, unified way to add monitoring for logs, metrics, and other types of data to a host.
  • Filebeat is a lightweight shipper for forwarding and centralising log data.

Which one should I use? The Elastic-Agent provides the best experience, everything can be configured via Kibana, you can see the Elastic-Agent's logs on Kibana as well as it's healthy status.

If for some reason you cannot run the Elastic-Agent, Filebeat is still an option, we will cover its configuration at the end of the article.

0. Elastic-Agent basics

This article assumes you are familiar with the Elastic-Agent, already have got one deployed and understand concepts like integrations, policy, etc.

1. Log as JSON

That's it. If your logs are JSON objects, Elastic-Agent can already parse it, then it becomes a matter of fine tuning a few things to make sure the timestamp is ingested correctly and the fields have got the correct type.

2. Some example logs

Let's use those logs as example:

{"level":"info","time":"2022-11-28T12:00:00+01:00","message":"Starting Advent Calendar demo", "status_code": 200}
{"level":"info","time":"2022-11-28T12:00:32+01:00","message":"First line", "status_code": 300}
{"level":"debug","time":"2022-11-28T12:08:32+01:00","message":"Second line", "status_code": 400}
{"level":"error","time":"2022-11-28T12:09:32+01:00","message":"Third line", "status_code": 500}
{"level":"info","time":"2022-11-28T12:10:32+01:00","message":"Forth line", "status_code": 100}
{"level":"warn","time":"2022-11-28T12:11:32+01:00","message":"Fith line", "status_code": 200}

We have three fields there:

  • level: it's the log level, we want to filter it as keyworkd (e.g: info, error, debug, etc).
  • time: is the time when the log line was written, we need to tell the Elastic-Agent to use it as the time for the log entry instead the time of ingestion.
  • message: it's the message itself, we will consider it as a free text field.
  • status_code: a numeric field simulating a HTTP status code.

3. Setup the integration

We will setup the Custom Logs integration. Aside adding the paths to the files we want to harvester, we need to add two optional configurations:

  • Processors: as the name suggests they can enrich, modify our events.
  • Custom configurations: well, they are custom configurations for our input.

Under the hood (at the time of writing) the Elastic-Agent will run a Filebeat instance, so all documentation for those optional configurations are Filebeat's documentation. The Custom Log integration uses the Log input under the hood, so the documentation we are interested are:

The custom configuration we need is:

Processors

We will need two processors:

  • timestamp: It will parse our timestamp and correctly set it on the final event
  • drop_fields: This one is optional, but there is no need to keep the time field there if we already have correctly set the @timestamp in the event.
- timestamp:
    field: time
    layouts:
      - '2006-01-02T15:04:05Z07:00'
    test:
      - '2022-08-31T12:07:32+02:00'
- drop_fields:
    fields:
      - time

The only caveat here is that the timestamp processor is still in beta, however it's stable enough to be used. Anyway, keep that in mind when using it.

Custom configuration

The custom configuration is about telling the Log input we want it to parse the data as JSON, override any keys that already exist in the event and if there are errors, add an error key to the final event so we can know what is happening.

json:
  keys_under_root: true
  add_error_key: true
  message_key: message
  overwrite_keys: true

That is how everything will look on Kibana:

4. Test it

Save the integration, wait for the policy update propagate to your Elastic-Agent, add some data to your log file, then go see the harvested logs on Kibana.

5. Mappings

Now that we have some data, let's make sure Elasticsearch understands our data correctly, for that we need to set the mappings for our data.

Head to Fleet > Agent Policies, click on the policy name, then on the integration name. On the next screen go to Change defaults > Advanced options, at the very bottom there is the Mappings section.
04-Mappings

Click on the "edit" button (the little pencil) and add the following mappings:

Then click "Next" until the "Review" step, then click "Save component template".

6. Test it (again)

Head to Discover, search for some data, expand one of the documents and you will see the fields correctly mapped.
06-Expanded_document_fields_correct

Now you can do queries like status_code >= 400.

7. How should I define my log keys?

The best way to define your log keys is to use the Elastic Common Schema (ECS). ECS is an open source specification, developed with support from the Elastic user community. ECS defines a common set of fields to be used when storing event data in Elasticsearch, such as logs and metrics.

8. What about a standalone Filebeat?

Well, it's pretty much the same idea, but instead of configuring an Elastic-Agent integration we will configure the input directly on filebeat.yml (Filebeat's configuration file).

When running a standalone Filebeat, it is better to use the filestream input. The concepts are all the same, but the configuration keys are slightly different. For the sake of brevity here is the input configuration of filebeat.yml:

filebeat.inputs:
- type: filestream
  id: advent-calendar-2022
  enabled: true
  paths:
    - /tmp/flog.log

  parsers:
    - ndjson:
        target: ""
        add_error_key: true

  processors:
    - timestamp:
        field: time
        layouts:
          - '2006-01-02T15:04:05Z07:00'
        test:
          - '2022-08-31T12:07:32+02:00'
    - drop_fields:
        fields: [time]

What about mappings?

They can also be edited on Kibana. Go to Stack Management > Index Management > Index Templates, search for filebeat. By default Filebeat creates a data stream named filebeat-<version>, so at the time of writing, that is filebeat-8.5.2. Click on it, then on "manage", and "edit" in the menu menu that will appear. Click "Next" until section 4, Mappings, and set the mappings as on step 5.

The last step is to create a data view, go to Stack Management > Data Views, then click on "Create data view", give it a name and for the "Index pattern", add filebeat-8.5.2*. If yo do not want to use the timestamp processor, you can change here the timestamp field. Click on "Save data view to Kibana".

Here is how it looks like:

Head back to "Discover", select the new data view (filebeat-8.5.2 in our case) and you will be able to see all your data:

3 Likes

Thanks for the explanation. I got a question which might help me decide what agent I should use.
We're collecting different logs from different servers, so e.g.:

server1: /var/log/abc/*.log
server2: /var/log/def/*.log

Is it possible in the Elastic Agent to narrow down the log path on certain servers or like deploy different configurations to a group of hosts?

Hi @Ossenfeld, I'm glad it was helpful to you!

Yes, with Elastic-Agent you can have different policies and assign any number of hosts to the same policy. Each policy will have its own integrations with their unique configuration. The Elastic-Agent will definitely fit your use case.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.