Parse Elasticsearch json logs in filebeat

qwinkler · January 5, 2023, 10:48am

Hello. I want to properly collect Elasticsearch logs. I have the following architecture.
On the Linux node, I have Docker installed. I configured the Journald logging driver using official documentation (Journald logging driver | Docker Documentation). On this node, I'm running Elasticsearch in Docker (version 7.17.3). Since I have Journald logging, the container sends its logs to journald instead of files. Also, I have a filebeat running in a container with journald input to grab these logs:

filebeat.inputs:
- type: journald

Later, these logs are ingested in Kafka:

output.kafka:
  enabled: true
  hosts: "<omitted>"
  topic: "<omitted>"
  # and other Kafka parameters

Later these logs will be parsed by Logstash:

input {
  kafka {
      bootstrap_servers => "<omitted>"
      topics_pattern => "<omitted>"
      codec => "json"
      # and other Kafka parameters
    }
}

The final scheme: Docker container -> journald -> filebeat with journald input -> Kafka -> Logstash -> Elasticsearch index.

So, I just want to properly parse logs from Elasticsearch. The problem is that Elasticsearch writes stacktrace as an array on separate lines.
For instance, this message will be parsed and ingested in ES as a single document:

{"type":"server","timestamp":"2023-01-05T09:45:51,995Z","level":"INFO","component":"c.f.s.e.WatchRunner","cluster.name":"<omitted>","node.name":"<omitted>","message":"<omitted>","cluster.uuid":"<omitted>","node.id":"<omitted>"}

And this message will be parsed and ingested in ES as 4 non-json documents:

{"type": "server", "timestamp": "2023-01-05T09:45:51,996Z", "level": "WARN", "component": "c.f.s.i.InternalAuthTokenProvider", "cluster.name": "<ommited>", "node.name": "<ommited>", "message": "<ommited>", "cluster.uuid": "<ommited>", "node.id": "<ommited>" ,
"stacktrace": ["<omitted>",
"at org.elasticsearch.<omitted>",
"at org.elasticsearch.<omitted>"] }

So, filebeat with journald input reads it as separate messages.
I just want filebeat to parse it and ingest it in Kafka as a single json message, so it will be ingested in ES later as a single document.

I tried to configure the multiline, but had no luck with it:

filebeat.inputs:
- type: journald
  id: everything
  seek: cursor
  paths: ["/var/log/journal"]
  multiline.type: pattern
  multiline.pattern: '(\"stacktrace\"|^\"(.*)\"(]| |})*)'
  multiline.negate: false
  multiline.match: after

The question is: how to properly parse multiline JSON logs in my case?

Ayush_Mathur · January 6, 2023, 8:01am

Try using

decode_json_fields

processor in your journald input type. I have configured in my use case and the logs are coming in perfectly.

qwinkler · January 6, 2023, 1:54pm

Thanks for the help.

I made a few more tests and it turned out that it's a bug in a filebeat with journald input. More details if you are interested are on GitHub: [Filebeat] multiline doesn't work with journald input · Issue #34200 · elastic/beats · GitHub

Ayush_Mathur · January 6, 2023, 1:58pm

@qwinkler TBH, I didn't face any such issue in my case and the events were being processed and ingested properly. Can you please share how you configured decode_json_fields for journald input ?

qwinkler · January 6, 2023, 8:14pm

I don't need the decode_json_fields processor and there's why. As I understand, this processor (Decode JSON fields | Filebeat Reference [8.11] | Elastic) is useful to extract json string from the event. For instance, you have an event:

{
  "hello": "world",
  "message": "{\"level\": \"INFO\",\"msg\": \"hello\"}"
}

And with the following configuration:

processors:
  - decode_json_fields:
      fields: ["message"]
      target: "log"

You'll get the following event in ES:

{
  "hello": "world",
  "message":"{\"level\": \"INFO\",\"msg\": \"hello\"}"},
  "log": {
    "level": "INFO",
    "msg": "hello"
  }
}

In my use case, the event is not "full", it's split by several messages. So, I don't need to decode JSON fields, I just need to "combine" several messages into one event.

Back to my scenario. I have 6 distinct messages in journal:

{"hello": "person"}
{"hello": "world",
"x": ["1",
"2",
"3"]}
{"hello": "2nd person"}

They will be sent to ES as 6 events (1 line = 1 event). So, I'll receive two JSON events (lines 1 and 6) and four string-like events (2,3,4,5), because it's an invalid JSON. I want to send only 3 events, like this:

{"hello": "person"}
{"hello": "world", "x": ["1","2","3"]}
{"hello": "2nd person"}

Therefore, I have to use multiline (Manage multiline messages | Filebeat Reference [8.5] | Elastic) to combine multiple messages into one event. Unfortunately, it doesn't work with journald (I'm using it), while it works with the log input.

TiagoQueiroz · January 13, 2023, 2:18pm

Hi @qwinkler,

The journald input uses parsers in the same way as the filestream input, you can define them following the documentation here.

We even have a test ensuring the multiline parser works with journald input:

github.com

elastic/beats/blob/878f026103df1d4b8fa8a7152bc5bd14ff1278b8/filebeat/input/journald/input_parsers_test.go#L34-L62


      
          func TestInputParsers(t *testing.T) {
          	inputParsersExpected := []string{"1st line\n2nd line\n3rd line", "4th line\n5th line\n6th line"}
          	env := newInputTestingEnvironment(t)
          
          
	inp := env.mustCreateInput(mapstr.M{
          		"paths":                 []string{path.Join("testdata", "input-multiline-parser.journal")},
          		"include_matches.match": []string{"_SYSTEMD_USER_UNIT=log-service.service"},
          		"parsers": []mapstr.M{
          			{
          				"multiline": mapstr.M{
          					"type":        "count",
          					"count_lines": 3,
          				},
          			},
          		},
          	})
          
          
	ctx, cancelInput := context.WithCancel(context.Background())
          	env.startInput(ctx, inp)
          	env.waitUntilEventCount(len(inputParsersExpected))

This file has been truncated. show original

I also added this information in the GH issue.

TiagoQueiroz · January 13, 2023, 2:20pm

There is also an example in the reference documentation: beats/filebeat.reference.yml at main · elastic/beats · GitHub

system · February 10, 2023, 4:21pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Multiline codec with Docker log driver logs Beats filebeat	4	2096	January 1, 2019
Docker logs to ES with FileBeat Beats filebeat	5	1982	January 25, 2017
Moving from ELK to EFK Beats docker , filebeat	3	1205	November 26, 2019
Parsing logs from Dockerized NGinx Logstash	2	1111	October 4, 2019
Parsing JSON in JSON (docker logs) Beats filebeat	2	1800	December 8, 2016

Parse Elasticsearch json logs in filebeat

Related topics