Each time logstash instance dies, the number of data stored in elasticsearch increases

I'm using ELK(Elasticsearch, Logstash, Kibana, Beat) with a Kafka in front of Logstash.

However, I noticed something unusual during the failover test of Logstash.

The test was carried out by suddenly killing the one logstash instance while loading 10,000 logs of files from filebeat to kafka. (logstash did not use any filter, only input, output.)

The result of the test was that each time there was a logstash problem, the number of data loaded on elasticsearch increased by approximately 7000 to 9000 more.

In relation to this test, I conducted tests with changes in partition and replication factor of kafka-topic, and found that only the kafka-topic index assigned to the logstash process in question produce duplicate data.

Why does the number of data increase when logstash instance dies?

Please help me. Thank you.

Have you tried enabling persistent queues and disabling auto_commit?

Hello, we've done a test based on your answers.

First, auto_commit was false in the kafka, and secondly, the logstash enabled persistent_queues.

As a result, these options seemed to have no effect on duplicate_data.

Are there other option values to look at?

Thank you for your answer.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.