First some background:
I have an ELK stack running in an docker compose environment, so far for learning purposes. Logstash gets its input from a AWS S3 bucket and sends its output to the Elastic Search server. The processed files are not removed or moved to another bucket, this is nothing we want to change.
If for what ever reason the Logstash instance crash or die in any other way. Then it would probably process all the data in the AWS S3 bucket all over again, generating duplicate entries. This because the since file in the previous logstash instance is lost on crash/death.
Is there some elegant solution to this duplication?
Would it be sufficient to volume mount the since file in the docker image to stop the duplication?