Understanding differences in similar logstash-pipelines - metrics/logs to look for?

Andre_Zoufahl · March 6, 2020, 2:27pm

Hi all,
I'm trying to get a better understanding and improve my debugging-skills when looking at logstash behaviour.
Currently I'm having 2 almost identical logstash processes (logstash-oss 7.5.1).
Both consume from the same kafka-topic as input, mutate the data slightly and write it to s3. Provided resources and environment are identical.

When looking for consumer-lag and message-consumption, logstash A consumes slightly less then logstash B and I don't know why, but I'm a bit overwhelmed where to start, given the amount of metrics-data, config-items and such. Thus this question here.

See the screenshot, panel bottom right for the consumption over the past 12 hours. Both start at same rate and then suddenly blue drops below but both consumers still follow the same pattern. Sadly blue builds constant lag on the topic. (top right panel)

Oddly enough is the orange consumer doing more in its pipeline:
Here is the pipelineconfig (for blue consumer):

    input {
      kafka {
        topics_pattern => "topictorulethemall"
        bootstrap_servers => "brokerurl:9092"
        group_id => "sauron-s3"
        consumer_threads => 3
        decorate_events => false
        codec => "json"
      }
    }

    output {
      s3 {
        bucket => "bucketname"
        prefix => "sauron/%{[publisherName]}/%{[eventType]}/day=%{+yyyyMMdd}/hour=%{+HH}/"
        region => "us-east-1"
        size_file => 33554432
        time_file => 5
        upload_workers_count => 12
        codec => "json_lines"
        encoding => "gzip"
      }
    }

Here is the pipelineconfig (for orange consumer):

    input {
      kafka {
        topics_pattern => "topictorulethemall"
        bootstrap_servers => "brokerurl:9092"
        group_id => "zoufahl-delayed-s3"
        consumer_threads => 3
        decorate_events => false
        codec => "json"
      }
    }

    filter {
      mutate {
        remove_field => [ "[message][Auction][bidders][bids][adHtml]" ]
      }

      date {
        match => [ "trackerTimestamp", "ISO8601" ]
      }
    }

    output {
      s3 {
        bucket => "bucketname"
        prefix => "trktsp/%{[publisherName]}/datetime=%{+yyyyMMddHH}/minute=%{+mm}/"
        region => "us-east-1"
        size_file => 33554432
        time_file => 5
        upload_workers_count => 12
        codec => "json_lines"
        encoding => "gzip"
      }
    }

Yet orange(zoufahl-delayed-s3) consumes better than blue(sauron-s3)
In the logs there are no warnings nor errors that would indicate why this is happening.
And enabling debug would flood them, thus wondering where can I start to check for differences that would explain such behavior.
Kafka-metrics suggest that the brokers are fine, so I hope it's not a kafka-issue, also given that I used to have more consumers in the past and things were fine.

So in short: any ideas how to start from here ? what log-levels for what components would give interesting information ?
As it stands, I do not know why they behave differently and whether the lagging logstash will ever catch-up? Let me know what information you need.

system · April 3, 2020, 2:27pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Big gaps in recorded logs Logstash	8	3098	July 6, 2017
Kafka increasing consumer lag Logstash docker	1	743	March 22, 2023
Logstash via kafka - case for pipeline usage? Logstash	2	288	April 27, 2019
Logstash and Kafka Input Logstash	8	2152	December 8, 2023
Really big retrieval lag for Logstash Kafka inputs producing data irregularly Logstash	1	1324	September 6, 2017

Understanding differences in similar logstash-pipelines - metrics/logs to look for?

Related topics