Configuring "consumer_threads" for a Kafka input with multiple Logstash instances

I'm analysing the way a logging stack has been configured and am facing a particular issue.

We have 2 Logstash instances. In those two instances, we have two pipelines which are consuming from Kafka. See the input section of the pipeline below:

input {
  kafka {
    bootstrap_servers => ["broker1:9092,broker2:9092,broker3:9092"]
    topics => "topic-name"
    security_protocol => "SSL"
    ssl_truststore_location => "/path/to/truststore/truststore.jks"
    ssl_truststore_password => "XYZ"
    client_id => "client-id"
    consumer_threads => "10"
  }
}

The above pipeline configuration is the same for both Logstash instances.
Regarding Kafka - this particular topic has a total of 8 partitions.

Issue #1 - According to this thread we have a total of 20 consumer threads across both Logstash instances, but since only 8 partitions exist for this consumer group, that means there's a total of 12 consumer_threads which are idle at all times between both Logstash servers. For this scenario, the correct configuration would be to have 4 consumer_threads on each Logstash instance. Is this conclusion correct?

Issue #2 - Checking the number of workers on Logstash, for this particular pipeline I can see the following output of running curl -XGET -k http://127.0.0.1:9600/_node/pipelines?pretty:

    "pipeline-name" : {
      "ephemeral_id" : "553bc152-e29f-4c45-88f7-f5b77ff5a5b2",
      "hash" : "751c0c4bd2060bb70793328544279246418c135f3354596212ebb40247b3d5ff",
      "workers" : 16,
      "batch_size" : 125,
      "batch_delay" : 50,
      "config_reload_automatic" : false,
      "config_reload_interval" : 3000000000,
      "dead_letter_queue_enabled" : false
    }

Considering we have 16 workers on each Logstash instance, does that directly correlate with the consume_threads set on the pipeline.conf, meaning each instance would have 16 workers * 10 consumer threads = 160 consumer threads on each Logstash instance?

For context: the end-goal here is to improve the Consumer lag which currently is extremely high and is leading to a 3-5 hour delay in logs showing up on OpenSearch where Logstash is currently sending the logs to. Our idea was to increase the number of partitions for the Kafka topics with a higher number of messages so that we can also directly increase the Logstash consumer_threads consuming the topics (again, according the shared thread above these consumer threads have a 1:1 ration with partitions on Kafka) and hopefully decrease the lag.

Thank you

OpenSearch/OpenDistro are AWS run products and differ from the original Elasticsearch and Kibana products that Elastic builds and maintains. You may need to contact them directly for further assistance. See What is OpenSearch and the OpenSearch Dashboard? | Elastic for more details.

(This is an automated response from your friendly Elastic bot. Please report this post if you have any suggestions or concerns :elasticheart: )

No, it does not. You have 10 threads running the input, and 16 threads running the filters and outputs.

So this means we can take the workers out of the equation when considering the number of consumer_threads per logstash instance, per Kafka topic replica.

Thank you!

You should have as many consumer_threads as the number of partitions in your topic, if you add another logstash instance, then you need to recalculate this.

For example, with 8 partitions and 2 instances, you may have 4 consumer threads each, as you mentioned in the Issue #1 part.

Another thing that you can do to improve the input is to change the default value of max_poll_records, from 500 to 1000 for example.

And your output will also impact in the lag size as if the destination can not process the data fast enough, it will tell logstash to backoff a little, so you caldo also increase the value of pipeline.batch_size from the default 125 to another number, start with the double for example.