Hi everyone!
I'm using the ELK stack with Kafka to collect and analyze logs from my K8s environment.
We have different log formats, mainly plaintext and JSON.
How can I process them in Logstash properly?
Right now, plain logs have the wrong breakdown.
For example, a plaintext message is breaking into two or more messages.
My simplified config is :
Imagine for example httpd log sample entry like this :
127.0.0.1 - frank [10/Oct/2023:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
I am getting aforementioned entry in two entries, like :
There is no issues with producer since I am getting all the logs.
Filebeat in k8s are sending this to kafka.
I experience no issues when a log has json format
Logstash does not split anything into multiple events unless explicitly configured to do so.
If you have one event with this message 127.0.0.1 - frank [10/Oct/2023:13:55:36 -0700] and another event with this message "GET /apache_pb.gif HTTP/1.0" 200 2326, this means that you have one kafka message for each.
Logstash just consumes the messages in the way they are stored in your Kafka.
If you have a message like this in Kafka:
127.0.0.1 - frank [10/Oct/2023:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
Then you will have an event in logstash with this entire message.
Please provide some examples on how your filebeat is configured and how your output looks like in Elasticsearch, show an example of this message being split in two.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.