Process two types of logs format

ELK_enjoyer · September 4, 2024, 9:48am

Hi everyone!
I'm using the ELK stack with Kafka to collect and analyze logs from my K8s environment.
We have different log formats, mainly plaintext and JSON.
How can I process them in Logstash properly?
Right now, plain logs have the wrong breakdown.
For example, a plaintext message is breaking into two or more messages.
My simplified config is :

input {
    kafka {
        bootstrap_servers => "kafka_ip:port"
        topics => ["sample_topic"]
        group_id => "elk"
        client_id => "logstash-sample"
        auto_offset_reset => "earliest"
        consumer_threads => 4
        codec => json
        decorate_events => true
        add_field => { "kafka_topic" => "sample_topic"}
        }

output {
    if [kafka_topic] == "sample_topic" {
    elasticsearch {
        hosts => ["localhost:9200"]
        index => "sample_topic-%{+YYYY.MM.dd}"
            }
        }

Would appreciate any help,
Thanks in advance.

ashishtiwari1993 · September 4, 2024, 12:22pm

Could you please give more details like how your plain text logs looks like and how you want to format them?

ELK_enjoyer · September 5, 2024, 9:22am

Imagine for example httpd log sample entry like this :
127.0.0.1 - frank [10/Oct/2023:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
I am getting aforementioned entry in two entries, like :

entry1: 127.0.0.1 - frank [10/Oct/2023:13:55:36 -0700]
entry2: "GET /apache_pb.gif HTTP/1.0" 200 2326

It is obviously wrong, I want to receive it at least as it was originally

ELK_enjoyer · September 23, 2024, 6:29am

Anyone got any ideas ?

leandrojmp · September 23, 2024, 12:46pm

What is sending this log to your Kafka? This seems to be an issue in your producer, you need to fix it before sending to kafka.

Logstash will consume your messages as they are in Kafka.

ELK_enjoyer · September 23, 2024, 1:36pm

There is no issues with producer since I am getting all the logs.
Filebeat in k8s are sending this to kafka.
I experience no issues when a log has json format

leandrojmp · September 23, 2024, 1:46pm

Yeah, but how it is sending it?

Logstash does not split anything into multiple events unless explicitly configured to do so.

If you have one event with this message 127.0.0.1 - frank [10/Oct/2023:13:55:36 -0700] and another event with this message "GET /apache_pb.gif HTTP/1.0" 200 2326, this means that you have one kafka message for each.

Logstash just consumes the messages in the way they are stored in your Kafka.

If you have a message like this in Kafka:

127.0.0.1 - frank [10/Oct/2023:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326

Then you will have an event in logstash with this entire message.

Please provide some examples on how your filebeat is configured and how your output looks like in Elasticsearch, show an example of this message being split in two.

Topic		Replies	Views
Process logs of different formats to JSON Logstash	1	417	August 23, 2023
Exctracting JSON format of data from the Input logs containing both JSON and Plain Text Logstash	4	501	September 7, 2018
Parse JSON array to flat JSON with filter, or alternative solution Logstash	4	5486	July 6, 2017
Sending text log files using Logstash to Kafka Logstash	8	2346	August 1, 2018
Logstash+Kafka charset issue Logstash	1	1317	July 6, 2017

Process two types of logs format

Related topics