Hello. I want to properly collect Elasticsearch logs. I have the following architecture.
On the Linux node, I have Docker installed. I configured the Journald logging driver using official documentation (Journald logging driver | Docker Documentation). On this node, I'm running Elasticsearch in Docker (version 7.17.3). Since I have Journald logging, the container sends its logs to journald instead of files. Also, I have a filebeat running in a container with journald input to grab these logs:
filebeat.inputs:
- type: journald
Later, these logs are ingested in Kafka:
output.kafka:
enabled: true
hosts: "<omitted>"
topic: "<omitted>"
# and other Kafka parameters
Later these logs will be parsed by Logstash:
input {
kafka {
bootstrap_servers => "<omitted>"
topics_pattern => "<omitted>"
codec => "json"
# and other Kafka parameters
}
}
The final scheme: Docker container -> journald -> filebeat with journald input -> Kafka -> Logstash -> Elasticsearch index.
So, I just want to properly parse logs from Elasticsearch. The problem is that Elasticsearch writes stacktrace as an array on separate lines.
For instance, this message will be parsed and ingested in ES as a single document:
{"type":"server","timestamp":"2023-01-05T09:45:51,995Z","level":"INFO","component":"c.f.s.e.WatchRunner","cluster.name":"<omitted>","node.name":"<omitted>","message":"<omitted>","cluster.uuid":"<omitted>","node.id":"<omitted>"}
And this message will be parsed and ingested in ES as 4 non-json documents:
{"type": "server", "timestamp": "2023-01-05T09:45:51,996Z", "level": "WARN", "component": "c.f.s.i.InternalAuthTokenProvider", "cluster.name": "<ommited>", "node.name": "<ommited>", "message": "<ommited>", "cluster.uuid": "<ommited>", "node.id": "<ommited>" ,
"stacktrace": ["<omitted>",
"at org.elasticsearch.<omitted>",
"at org.elasticsearch.<omitted>"] }
So, filebeat with journald input reads it as separate messages.
I just want filebeat to parse it and ingest it in Kafka as a single json message, so it will be ingested in ES later as a single document.
I tried to configure the multiline, but had no luck with it:
filebeat.inputs:
- type: journald
id: everything
seek: cursor
paths: ["/var/log/journal"]
multiline.type: pattern
multiline.pattern: '(\"stacktrace\"|^\"(.*)\"(]| |})*)'
multiline.negate: false
multiline.match: after
The question is: how to properly parse multiline JSON logs in my case?