Configuring "consumer_threads" for a Kafka input with multiple Logstash instances

mmfsilva · July 17, 2024, 12:02pm

I'm analysing the way a logging stack has been configured and am facing a particular issue.

We have 2 Logstash instances. In those two instances, we have two pipelines which are consuming from Kafka. See the input section of the pipeline below:

input {
  kafka {
    bootstrap_servers => ["broker1:9092,broker2:9092,broker3:9092"]
    topics => "topic-name"
    security_protocol => "SSL"
    ssl_truststore_location => "/path/to/truststore/truststore.jks"
    ssl_truststore_password => "XYZ"
    client_id => "client-id"
    consumer_threads => "10"
  }
}

The above pipeline configuration is the same for both Logstash instances.
Regarding Kafka - this particular topic has a total of 8 partitions.

Issue #1 - According to this thread we have a total of 20 consumer threads across both Logstash instances, but since only 8 partitions exist for this consumer group, that means there's a total of 12 consumer_threads which are idle at all times between both Logstash servers. For this scenario, the correct configuration would be to have 4 consumer_threads on each Logstash instance. Is this conclusion correct?

Issue #2 - Checking the number of workers on Logstash, for this particular pipeline I can see the following output of running curl -XGET -k http://127.0.0.1:9600/_node/pipelines?pretty:

    "pipeline-name" : {
      "ephemeral_id" : "553bc152-e29f-4c45-88f7-f5b77ff5a5b2",
      "hash" : "751c0c4bd2060bb70793328544279246418c135f3354596212ebb40247b3d5ff",
      "workers" : 16,
      "batch_size" : 125,
      "batch_delay" : 50,
      "config_reload_automatic" : false,
      "config_reload_interval" : 3000000000,
      "dead_letter_queue_enabled" : false
    }

Considering we have 16 workers on each Logstash instance, does that directly correlate with the consume_threads set on the pipeline.conf, meaning each instance would have 16 workers * 10 consumer threads = 160 consumer threads on each Logstash instance?

For context: the end-goal here is to improve the Consumer lag which currently is extremely high and is leading to a 3-5 hour delay in logs showing up on OpenSearch where Logstash is currently sending the logs to. Our idea was to increase the number of partitions for the Kafka topics with a higher number of messages so that we can also directly increase the Logstash consumer_threads consuming the topics (again, according the shared thread above these consumer threads have a 1:1 ration with partitions on Kafka) and hopefully decrease the lag.

Thank you

system · July 17, 2024, 12:03pm

OpenSearch/OpenDistro are AWS run products and differ from the original Elasticsearch and Kibana products that Elastic builds and maintains. You may need to contact them directly for further assistance. See What is OpenSearch and the OpenSearch Dashboard? | Elastic for more details.

(This is an automated response from your friendly Elastic bot. Please report this post if you have any suggestions or concerns )

Badger · July 17, 2024, 1:18pm

No, it does not. You have 10 threads running the input, and 16 threads running the filters and outputs.

mmfsilva · July 17, 2024, 2:55pm

So this means we can take the workers out of the equation when considering the number of consumer_threads per logstash instance, per Kafka topic replica.

Thank you!

leandrojmp · July 17, 2024, 3:08pm

You should have as many consumer_threads as the number of partitions in your topic, if you add another logstash instance, then you need to recalculate this.

For example, with 8 partitions and 2 instances, you may have 4 consumer threads each, as you mentioned in the Issue #1 part.

Another thing that you can do to improve the input is to change the default value of max_poll_records, from 500 to 1000 for example.

And your output will also impact in the lag size as if the destination can not process the data fast enough, it will tell logstash to backoff a little, so you caldo also increase the value of pipeline.batch_size from the default 125 to another number, start with the double for example.

Topic		Replies	Views
How many consumer_threads of Kafka input to configure if duplicate Logstash instances Logstash	1	472	July 24, 2018
Kafka consumer_threads Logstash	0	99	May 3, 2024
More consumer threads than the number specified in the input plugin Logstash	1	455	August 9, 2019
Multiple logstash reading from a single kafka topic Logstash	10	17371	July 6, 2017
Kafka input plugin: How many Logstash consumer threads to be used for many logstash consumers Logstash	1	3501	July 4, 2017

Configuring "consumer_threads" for a Kafka input with multiple Logstash instances

Related topics