Understand "reconnect" and "retries" settings in Kafka output

Hello all,

With this config :

input {
  tcp {
    type => "NETWORK_DEVICE"
    port => 1602
  }

filter{
}

output {

kafka {
topic_id => ["network.device"]
codec => json
bootstrap_servers => "kafka1:9092,kafka2:9092,kafka3:9092"
ssl_truststore_location => "/etc/logstash/kafka.server.truststore.jks"
ssl_truststore_password => "itdepends"
ssl_truststore_type => "JKS"
security_protocol => "SSL"
}
}

I noticed that if kakfa nodes are down, logstash show this message every x ms :

[...]
[2021-05-05T04:33:36,893][WARN ][org.apache.kafka.clients.NetworkClient][network_device] [Producer clientId=producer-1] Connection to node 2 (kafka2/192.168.2.230:9092) could not be established. Broker may not be available.
[2021-05-05T04:33:36,919][WARN ][org.apache.kafka.clients.NetworkClient][network_device] [Producer clientId=producer-1] Connection to node 3 (kafka3/192.168.2.248:9092) could not be established. Broker may not be available.
[2021-05-05T04:33:36,945][WARN ][org.apache.kafka.clients.NetworkClient][network_device] [Producer clientId=producer-1] Connection to node 2 (kafka2/192.168.2.230:9092) could not be established. Broker may not be available.
[2021-05-05T04:33:36,997][WARN ][org.apache.kafka.clients.NetworkClient][network_device] [Producer clientId=producer-1] Connection to node 2 (kafka2/192.168.2.230:9092) could not be established. Broker may not be available.
[2021-05-05T04:33:37,021][WARN ][org.apache.kafka.clients.NetworkClient][network_device] [Producer clientId=producer-1] Connection to node 3 (kafka3/192.168.2.248:9092) could not be established. Broker may not be available.
[2021-05-05T04:33:37,048][WARN ][org.apache.kafka.clients.NetworkClient][network_device] [Producer clientId=producer-1] Connection to node 2 (kafka2/192.168.2.230:9092) could not be established. Broker may not be available.
[2021-05-05T04:33:37,099][WARN ][org.apache.kafka.clients.NetworkClient][network_device] [Producer clientId=producer-1] Connection to node 2 (kafka2/192.168.2.230:9092) could not be established. Broker may not be available.
[2021-05-05T04:33:37,122][WARN ][org.apache.kafka.clients.NetworkClient][network_device] [Producer clientId=producer-1] Connection to node 3 (kafka3/192.168.2.248:9092) could not be established. Broker may not be available.
[2021-05-05T04:33:37,150][WARN ][org.apache.kafka.clients.NetworkClient][network_device] [Producer clientId=producer-1] Connection to node 2 (kafka2/192.168.2.230:9092) could not be established. Broker may not be available.
[2021-05-05T04:33:37,202][WARN ][org.apache.kafka.clients.NetworkClient][network_device] [Producer clientId=producer-1] Connection to node 2 (kafka2/192.168.2.230:9092) could not be established. Broker may not be available.
[...]

When looking at the documentation, I see three settings :

  • reconnect_backoff_ms
  • retries
  • retry_backoff_ms

I would like to understand these settings :

"reconnect_backoff_ms" : does this setting is related to the errors messages I put above ? Does it mean : "do not try to reconnect to kafka until x time ?"

"retries" : we speak about which retries ?

"retry_backoff_ms" : what is the difference with "retries" ?

Thanks for your help ! :slight_smile:

Hello Travis,

Those settings map to Kafka settings: Kafka producer configuration reference | Confluent Documentation

Maybe this description of the reconnect.backoff.max.ms parameter helps you understand the back off logic:

...the backoff per host will increase exponentially for each consecutive connection failure...

If you set the backoff to 1000ms the first retry would occur after 1 second, the second after 2 seconds, the third after 4 seconds and so on.

This is also true for the retry backoff. While the backoff parameter defines the time between retries the retries parameter defines the maximum count of retries.

Retries in Kafka are described as follows:

Setting a value greater than zero will cause the client to resend any record whose send fails with a potentially transient error. Note that this retry is no different than if the client resent the record upon receiving the error.

Best regards
Wolfram

Hello Wolfram,

Thansk for this feedback

I'm not sure to understand because, when you look at the logs above, reconnect happen evey 50 ms and never increase exponentially :

I we take only kakfa 2 :

ff

I speak about reconnect_backoff setting Kafka output plugin | Logstash Reference [7.12] | Elastic

Or maybe Logstash setting is just static and not exponential ?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.