am using Logstash running in Kubernetes to ingest data from Kafka and write to Elasticsearch.
If a Logstash instance ends abnormally while processing data, it can result in data loss. It appears there is no end-to-end acknowledgement of processing available ([Meta] End to End ACKs / Queueless Mode · Issue #8514 · elastic/logstash · GitHub).
What are the best settings to minimise data loss?
Currently I have not set enable_auto_commit, so it is defaulting to true.
I see the documentation (Kafka input plugin | Logstash Reference [8.12] | Elastic) says:
Default value is true
This committed offset will be used when the process fails as the position from which the consumption will begin.
If true, periodically commit to Kafka the offsets of messages already returned by the consumer. If value is false however, the offset is committed every time the consumer writes data fetched from the topic to the in-memory or persistent queue.
It sounds like this would only solve the problem of Logstash reading from the topic, and the offset being committed before it could then write to the in-memory queue. Is that correct?
What is the best I can do to avoid data loss within Logstash on Kubernetes? Is it:
- Set enable_auto_commit to FALSE, and
- Enable persistent queues
Or something else?