I just deployed on k8s a logstash deployment that reads logs from S3 and pushes them to Elasticsearch.
I noticed that the sincedb file when using the S3 input has this content:
2019-10-30 16:14:08 UTC
So the granularity goes down only to the second where it left reading.
What I've noticed is that if you have multiple log lines within the second and logstash restarts, you end up having either duplicates or having missing data due to the fact that it has to either move to the next second or read from the beginning of it.
I've tested a bit and It seems that it starts reading the whole second written in the sincedb so in my tests I got duplicates.
Is there anything I can do in order to fix this problem?