Understanding differences in similar logstash-pipelines - metrics/logs to look for?

Hi all,
I'm trying to get a better understanding and improve my debugging-skills when looking at logstash behaviour.
Currently I'm having 2 almost identical logstash processes (logstash-oss 7.5.1).
Both consume from the same kafka-topic as input, mutate the data slightly and write it to s3. Provided resources and environment are identical.

When looking for consumer-lag and message-consumption, logstash A consumes slightly less then logstash B and I don't know why, but I'm a bit overwhelmed where to start, given the amount of metrics-data, config-items and such. Thus this question here.

See the screenshot, panel bottom right for the consumption over the past 12 hours. Both start at same rate and then suddenly blue drops below but both consumers still follow the same pattern. Sadly blue builds constant lag on the topic. (top right panel)

Oddly enough is the orange consumer doing more in its pipeline:
Here is the pipelineconfig (for blue consumer):

    input {
      kafka {
        topics_pattern => "topictorulethemall"
        bootstrap_servers => "brokerurl:9092"
        group_id => "sauron-s3"
        consumer_threads => 3
        decorate_events => false
        codec => "json"
      }
    }

    output {
      s3 {
        bucket => "bucketname"
        prefix => "sauron/%{[publisherName]}/%{[eventType]}/day=%{+yyyyMMdd}/hour=%{+HH}/"
        region => "us-east-1"
        size_file => 33554432
        time_file => 5
        upload_workers_count => 12
        codec => "json_lines"
        encoding => "gzip"
      }
    } 

Here is the pipelineconfig (for orange consumer):

    input {
      kafka {
        topics_pattern => "topictorulethemall"
        bootstrap_servers => "brokerurl:9092"
        group_id => "zoufahl-delayed-s3"
        consumer_threads => 3
        decorate_events => false
        codec => "json"
      }
    }

    filter {
      mutate {
        remove_field => [ "[message][Auction][bidders][bids][adHtml]" ]
      }

      date {
        match => [ "trackerTimestamp", "ISO8601" ]
      }
    }

    output {
      s3 {
        bucket => "bucketname"
        prefix => "trktsp/%{[publisherName]}/datetime=%{+yyyyMMddHH}/minute=%{+mm}/"
        region => "us-east-1"
        size_file => 33554432
        time_file => 5
        upload_workers_count => 12
        codec => "json_lines"
        encoding => "gzip"
      }
    }

Yet orange(zoufahl-delayed-s3) consumes better than blue(sauron-s3)
In the logs there are no warnings nor errors that would indicate why this is happening.
And enabling debug would flood them, thus wondering where can I start to check for differences that would explain such behavior.
Kafka-metrics suggest that the brokers are fine, so I hope it's not a kafka-issue, also given that I used to have more consumers in the past and things were fine.

So in short: any ideas how to start from here ? what log-levels for what components would give interesting information ?
As it stands, I do not know why they behave differently and whether the lagging logstash will ever catch-up? Let me know what information you need.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.