Can Logstash lose data?

amitavmohanty01 · October 28, 2019, 6:32pm

If I understand correctly, Logstash has two thread pools: input (IN), processing and output combines (OUT). When Logstash is getting input from Kafka, at what point does it send an ACK to Kafka? If the ACK is sent when the IN buffer moves data moves buffer to OUT buffer, then there is a chance that there can be a loss of data if the process is restarted and there is some data in the OUT buffer which is not sent to Elasticsearch. However, if the ACK is sent after the data is sent to Elasticsearch, then process restart will always start where it left off.

Badger · October 28, 2019, 7:40pm

I believe the input acks the receipt of data from Kafka as soon as it receives it. You can use persistent queues to avoid data loss.

amitavmohanty01 · October 29, 2019, 4:49am

Using persistent queues to avoid data loss is costly because of the associated storage. It is costly in terms of both time and money.

Christian_Dahlqvist · October 29, 2019, 6:22am

There is an open issue for making Logstash capable to running in a stateless mode where the input is not acknowledged until the data has been written successfully to the outputs. This would remove the need for an internal persistent queue, but does not appear to be worked on. So for now a persistent queue is your best bet if you want to avoid data loss.

system · November 26, 2019, 6:36am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
When Logstash sends ACK to input source(Example- Filebeat) Logstash	3	2175	August 14, 2017
Does logstash drop Events Logstash	6	1558	September 11, 2018
Logstash output streaming when ES is down Logstash	4	865	May 14, 2018
Persistent Queue details Logstash	3	1174	September 5, 2018
Clustering data consistency question Elasticsearch	3	363	January 14, 2020

Can Logstash lose data?

Related topics