Parse Elasticsearch json logs in filebeat

Hello. I want to properly collect Elasticsearch logs. I have the following architecture.
On the Linux node, I have Docker installed. I configured the Journald logging driver using official documentation (Journald logging driver | Docker Documentation). On this node, I'm running Elasticsearch in Docker (version 7.17.3). Since I have Journald logging, the container sends its logs to journald instead of files. Also, I have a filebeat running in a container with journald input to grab these logs:

filebeat.inputs:
- type: journald

Later, these logs are ingested in Kafka:

output.kafka:
  enabled: true
  hosts: "<omitted>"
  topic: "<omitted>"
  # and other Kafka parameters

Later these logs will be parsed by Logstash:

input {
  kafka {
      bootstrap_servers => "<omitted>"
      topics_pattern => "<omitted>"
      codec => "json"
      # and other Kafka parameters
    }
}

The final scheme: Docker container -> journald -> filebeat with journald input -> Kafka -> Logstash -> Elasticsearch index.

So, I just want to properly parse logs from Elasticsearch. The problem is that Elasticsearch writes stacktrace as an array on separate lines.
For instance, this message will be parsed and ingested in ES as a single document:

{"type":"server","timestamp":"2023-01-05T09:45:51,995Z","level":"INFO","component":"c.f.s.e.WatchRunner","cluster.name":"<omitted>","node.name":"<omitted>","message":"<omitted>","cluster.uuid":"<omitted>","node.id":"<omitted>"}

And this message will be parsed and ingested in ES as 4 non-json documents:

{"type": "server", "timestamp": "2023-01-05T09:45:51,996Z", "level": "WARN", "component": "c.f.s.i.InternalAuthTokenProvider", "cluster.name": "<ommited>", "node.name": "<ommited>", "message": "<ommited>", "cluster.uuid": "<ommited>", "node.id": "<ommited>" ,
"stacktrace": ["<omitted>",
"at org.elasticsearch.<omitted>",
"at org.elasticsearch.<omitted>"] }

So, filebeat with journald input reads it as separate messages.
I just want filebeat to parse it and ingest it in Kafka as a single json message, so it will be ingested in ES later as a single document.

I tried to configure the multiline, but had no luck with it:

filebeat.inputs:
- type: journald
  id: everything
  seek: cursor
  paths: ["/var/log/journal"]
  multiline.type: pattern
  multiline.pattern: '(\"stacktrace\"|^\"(.*)\"(]| |})*)'
  multiline.negate: false
  multiline.match: after

The question is: how to properly parse multiline JSON logs in my case?

Try using

decode_json_fields

processor in your journald input type. I have configured in my use case and the logs are coming in perfectly.

Thanks for the help.

I made a few more tests and it turned out that it's a bug in a filebeat with journald input. More details if you are interested are on GitHub: [Filebeat] multiline doesn't work with journald input · Issue #34200 · elastic/beats · GitHub

@qwinkler TBH, I didn't face any such issue in my case and the events were being processed and ingested properly. Can you please share how you configured decode_json_fields for journald input ?

I don't need the decode_json_fields processor and there's why. As I understand, this processor (Decode JSON fields | Filebeat Reference [8.5] | Elastic) is useful to extract json string from the event. For instance, you have an event:

{
  "hello": "world",
  "message": "{\"level\": \"INFO\",\"msg\": \"hello\"}"
}

And with the following configuration:

processors:
  - decode_json_fields:
      fields: ["message"]
      target: "log"

You'll get the following event in ES:

{
  "hello": "world",
  "message":"{\"level\": \"INFO\",\"msg\": \"hello\"}"},
  "log": {
    "level": "INFO",
    "msg": "hello"
  }
}

In my use case, the event is not "full", it's split by several messages. So, I don't need to decode JSON fields, I just need to "combine" several messages into one event.

Back to my scenario. I have 6 distinct messages in journal:

{"hello": "person"}
{"hello": "world",
"x": ["1",
"2",
"3"]}
{"hello": "2nd person"}

They will be sent to ES as 6 events (1 line = 1 event). So, I'll receive two JSON events (lines 1 and 6) and four string-like events (2,3,4,5), because it's an invalid JSON. I want to send only 3 events, like this:

{"hello": "person"}
{"hello": "world", "x": ["1","2","3"]}
{"hello": "2nd person"}

Therefore, I have to use multiline (Manage multiline messages | Filebeat Reference [8.5] | Elastic) to combine multiple messages into one event. Unfortunately, it doesn't work with journald (I'm using it), while it works with the log input.

Hi @qwinkler,

The journald input uses parsers in the same way as the filestream input, you can define them following the documentation here.

We even have a test ensuring the multiline parser works with journald input:

I also added this information in the GH issue.

There is also an example in the reference documentation: beats/filebeat.reference.yml at main · elastic/beats · GitHub