I am investigating why we lost data from certain servers during a network outage. We are able to reproduce the conditions and can confirm that when the Kafka server is unavailable, anything sent to logstash is lost.
The setup is Filebeat 5.5.2 on APP-HOST:
output.logstash:
hosts: ["LOGSTASH-HOST:5043"]
On LOGSTASH-HOST - logstash 5.6.2 - kafka plugin 0.10.0.1
kafka.conf input { beats { port => "5043" } } output { kafka { bootstrap_servers => "KAFKA-HOST:9092" topic_id => "issue_logs" codec => "json" } }
The only addition to the logstash.yml is: (but we have tried without)
queue.type: persisted
KAFKA-HOST running Kafka 2.11-0-11.0.1
I can confirm receiving data into that Kafka topic using this basic setup.
To simulate the network outage, I am using iptables to drop all packets hitting port 9092 on the kafka server.
iptables -A INPUT -p tcp --dport 9092 -j DROP
When I enable the above iptables setting, effectively mimicking a network outage, any logs sent through filebeat are seen to arrive at LOGSTASH-HOST - but there is no logging or error messaging to be seen as to the connection status to Kafka or any errors at all in Logstash debug logs.
As soon as I disable the iptables setting (service iptables stop):
New data will arrive in that Kafka topic - anything sent before this though, is gone!
I have tried these settings to the Logstash kafka.conf file in hopes one of them may help but none of them seem to have any impact on this issue.
acks => "1" retries => 99 request_timeout_ms => "5000" reconnect_backoff_ms => 50 retry_backoff_ms => 5000 block_on_buffer_full => true metadata_max_age_ms => 10000
Am I missing an obvious setting here?
I can also confirm if I go directly from Filebeat to Kafka, things are fine. Filebeat itself does not lose anything and sends everything to Kafka once it is back up. But having Logstash in the middle is our desired setup at this time.