I am having an issue collecting docker containers logs.
The issue is that, for example, when a Java Stacktrace gets logged on multiple lines on stderr stream, it could be that other logs of the stdout stream are logged in between, resulting in this kind of docker log files:
{"log":"1st line of multiline log","stream":"stderr","time":"2020-05-20T11:04:44.302079084Z"}
{"log":"A Simple oneline log here","stream":"stdout","time":"2020-05-20T11:04:44.302098452Z"}
{"log":"A Simple oneline log here","stream":"stdout","time":"2020-05-20T11:04:44.302102025Z"}
{"log":"A Simple oneline log here","stream":"stdout","time":"2020-05-20T11:04:44.302299019Z"}
{"log":"A Simple oneline log here","stream":"stdout","time":"2020-05-20T11:04:44.302304733Z"}
{"log":"A Simple oneline log here","stream":"stdout","time":"2020-05-20T11:04:44.302309811Z"}
{"log":"2nd line of multiline log","stream":"stderr","time":"2020-05-20T11:04:44.30225182Z"}
{"log":"3rd line of multiline log","stream":"stderr","time":"2020-05-20T11:04:44.302319095Z"}
{"log":"4th line of multiline log","stream":"stderr","time":"2020-05-20T11:04:44.302321649Z"}
{"log":"5th line of multiline log","stream":"stderr","time":"2020-05-20T11:04:44.302323951Z"}
{"log":"6th line of multiline log","stream":"stderr","time":"2020-05-20T11:04:44.302326262Z"}
{"log":"7th line of multiline log","stream":"stderr","time":"2020-05-20T11:04:44.302333692Z"}
{"log":"8th line of multiline log","stream":"stderr","time":"2020-05-20T11:04:44.30233636Z"}
{"log":"9th line of multiline log","stream":"stderr","time":"2020-05-20T11:04:44.302338783Z"}
As you can see, there are some stdout logs mixed in between of a multiline stderr log.
Now, configuring the input as follows:
- type: container
paths:
- '/var/lib/docker/containers/*/*.log*'
multiline:
pattern: '^\*{5} ERROR \*{5}$|^\*{5} WARNING \*{5}$|^\*{17}$|^\[?[0-9]{4}[-.][0-9]{2}[-.][0-9]{2}[ T][0-9]{2}:[0-9]{2}:[0-9]{2}|^\[?[0-9]{2}[-.][0-9]{2}[-.][0-9]{4} [0-9]{2}:[0-9]{2}:[0-9]{2}'
negate: true
match: after
max_lines: 1000
timeout: 7s
stream: all
max_bytes: 100000
close_inactive: 2m
ignore_older: 24h
fields_under_root: true
fields:
source_system: filebeat_docker
The result is the separation of the 1st line of multiline log and the other lines in two separate documents in Elasticsearch.
First attempt of resolution
To solve the problem I defined two inputs on the same file, one that collects stdout logs, and the other one that collects stderr logs:
- type: container
paths:
- '/var/lib/docker/containers/*/*.log*'
multiline:
pattern: '^\*{5} ERROR \*{5}$|^\*{5} WARNING \*{5}$|^\*{17}$|^\[?[0-9]{4}[-.][0-9]{2}[-.][0-9]{2}[ T][0-9]{2}:[0-9]{2}:[0-9]{2}|^\[?[0-9]{2}[-.][0-9]{2}[-.][0-9]{4} [0-9]{2}:[0-9]{2}:[0-9]{2}'
negate: true
match: after
max_lines: 1000
timeout: 7s
stream: stdout
max_bytes: 100000
close_inactive: 2m
ignore_older: 24h
fields_under_root: true
fields:
source_system: filebeat_docker
- type: container
paths:
- '/var/lib/docker/containers/*/*.log*'
multiline:
pattern: '^\*{5} ERROR \*{5}$|^\*{5} WARNING \*{5}$|^\*{17}$|^\[?[0-9]{4}[-.][0-9]{2}[-.][0-9]{2}[ T][0-9]{2}:[0-9]{2}:[0-9]{2}|^\[?[0-9]{2}[-.][0-9]{2}[-.][0-9]{4} [0-9]{2}:[0-9]{2}:[0-9]{2}'
negate: true
match: after
max_lines: 1000
timeout: 7s
stream: stderr
max_bytes: 100000
close_inactive: 2m
ignore_older: 24h
fields_under_root: true
fields:
source_system: filebeat_docker
Unfortunately, as mentioned in the Filebeat documentation, defining two inputs on the same file is not recommended, and could cause unpredictable behaviors.
However, it works, the multiline log is created as a single document and the stdout logs too.
...BUT the CPU resources that Filebeat uses increased from 1% average to 30% average.
Do you have any other suggestions on how to solve this problem?
Thank you!