Handling duplicate data in Logstash + Elastic Search

Patrik_Iselind · January 19, 2018, 12:28pm

Hi,

First some background:
I have an ELK stack running in an docker compose environment, so far for learning purposes. Logstash gets its input from a AWS S3 bucket and sends its output to the Elastic Search server. The processed files are not removed or moved to another bucket, this is nothing we want to change.

My problem/question:
If for what ever reason the Logstash instance crash or die in any other way. Then it would probably process all the data in the AWS S3 bucket all over again, generating duplicate entries. This because the since file in the previous logstash instance is lost on crash/death.

Is there some elegant solution to this duplication?

Would it be sufficient to volume mount the since file in the docker image to stop the duplication?

Patrik_Iselind · January 19, 2018, 12:31pm

If volume mounting the since file would be sufficient, would initiating it to an empty file initially be good enouch to get things started? From what i've seen the since file isn't there upon first start, it's created as a step in the startup.

magnusbaeck · January 28, 2018, 9:28pm

You should definitely store state files like sincedb persistently, either in a persitent volume or mounted from the host. In the latter case I'd mount a directory instead of a file to avoid having to care about the question of what happens if the file is created implicitly when you mount the file into the container.

system · February 25, 2018, 9:28pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash sincedb file issue Logstash	1	488	September 17, 2020
Duplicates entries when using S3 input Logstash docker	1	528	November 27, 2019
Logstash Sincedb duplicate entries Logstash	2	186	November 29, 2023
Multiple Logstash Docker containers sharing an S3 input Logstash	2	2489	July 6, 2017
How is the state (last read file or position in a file) is maintained for multiple pods running logstash Logstash docker	5	281	October 13, 2022

Handling duplicate data in Logstash + Elastic Search

Related topics