I'm using ELK(Elasticsearch, Logstash, Kibana, Beat) with a Kafka in front of Logstash.
However, I noticed something unusual during the failover test of Logstash.
The test was carried out by suddenly killing the one logstash instance while loading 10,000 logs of files from filebeat to kafka. (logstash did not use any filter, only input, output.)
The result of the test was that each time there was a logstash problem, the number of data loaded on elasticsearch increased by approximately 7000 to 9000 more.
In relation to this test, I conducted tests with changes in partition and replication factor of kafka-topic, and found that only the kafka-topic index assigned to the logstash process in question produce duplicate data.
Why does the number of data increase when logstash instance dies?
Please help me. Thank you.