Kafka output plugin loses all data when Kafka is down


#1

I am investigating why we lost data from certain servers during a network outage. We are able to reproduce the conditions and can confirm that when the Kafka server is unavailable, anything sent to logstash is lost.

The setup is Filebeat 5.5.2 on APP-HOST:

output.logstash:
hosts: ["LOGSTASH-HOST:5043"]

On LOGSTASH-HOST - logstash 5.6.2 - kafka plugin 0.10.0.1

kafka.conf
input {
    beats {
        port => "5043"
    }
}
output {
  kafka {
      bootstrap_servers => "KAFKA-HOST:9092"
    topic_id => "issue_logs"
    codec => "json"
 }
}

The only addition to the logstash.yml is: (but we have tried without)
queue.type: persisted

KAFKA-HOST running Kafka 2.11-0-11.0.1
I can confirm receiving data into that Kafka topic using this basic setup.
To simulate the network outage, I am using iptables to drop all packets hitting port 9092 on the kafka server.

iptables -A INPUT -p tcp --dport 9092 -j DROP

When I enable the above iptables setting, effectively mimicking a network outage, any logs sent through filebeat are seen to arrive at LOGSTASH-HOST - but there is no logging or error messaging to be seen as to the connection status to Kafka or any errors at all in Logstash debug logs.

As soon as I disable the iptables setting (service iptables stop):
New data will arrive in that Kafka topic - anything sent before this though, is gone!

I have tried these settings to the Logstash kafka.conf file in hopes one of them may help but none of them seem to have any impact on this issue.

    acks => "1"
    retries => 99
    request_timeout_ms => "5000"
    reconnect_backoff_ms => 50
    retry_backoff_ms => 5000
    block_on_buffer_full => true
    metadata_max_age_ms => 10000

Am I missing an obvious setting here?
I can also confirm if I go directly from Filebeat to Kafka, things are fine. Filebeat itself does not lose anything and sends everything to Kafka once it is back up. But having Logstash in the middle is our desired setup at this time.


#2

A big cooincidence we just ran across this below just updated, it appears that this issue is being addressed - I will test this change out and see if it helps.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.