Each time logstash instance dies, the number of data stored in elasticsearch increases

securecode99 · November 25, 2019, 2:08pm

I'm using ELK(Elasticsearch, Logstash, Kibana, Beat) with a Kafka in front of Logstash.

However, I noticed something unusual during the failover test of Logstash.

The test was carried out by suddenly killing the one logstash instance while loading 10,000 logs of files from filebeat to kafka. (logstash did not use any filter, only input, output.)

The result of the test was that each time there was a logstash problem, the number of data loaded on elasticsearch increased by approximately 7000 to 9000 more.

In relation to this test, I conducted tests with changes in partition and replication factor of kafka-topic, and found that only the kafka-topic index assigned to the logstash process in question produce duplicate data.

Why does the number of data increase when logstash instance dies?

Please help me. Thank you.

Badger · November 25, 2019, 3:15pm

Have you tried enabling persistent queues and disabling auto_commit?

securecode99 · December 2, 2019, 9:38am

Hello, we've done a test based on your answers.

First, auto_commit was false in the kafka, and secondly, the logstash enabled persistent_queues.

As a result, these options seemed to have no effect on duplicate_data.

Are there other option values to look at?

Thank you for your answer.

system · December 30, 2019, 9:38am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash index keeps increasing all the time Logstash	2	612	June 17, 2017
Duplicate records found Logstash	1	297	October 17, 2018
Duplicated data in kafka using logstash Logstash	2	1072	September 7, 2018
Improve Logstash data resiliency Logstash	5	497	December 17, 2021
Logstash sending duplicate _bulk requests without any apparent reason Logstash	1	274	July 28, 2020

Each time logstash instance dies, the number of data stored in elasticsearch increases

Related topics