Duplicates entries when using S3 input

cascoalessio · October 30, 2019, 6:25pm

Hello Guys

I just deployed on k8s a logstash deployment that reads logs from S3 and pushes them to Elasticsearch.

I noticed that the sincedb file when using the S3 input has this content:
2019-10-30 16:14:08 UTC

So the granularity goes down only to the second where it left reading.

What I've noticed is that if you have multiple log lines within the second and logstash restarts, you end up having either duplicates or having missing data due to the fact that it has to either move to the next second or read from the beginning of it.
I've tested a bit and It seems that it starts reading the whole second written in the sincedb so in my tests I got duplicates.

Is there anything I can do in order to fix this problem?

Thanks
Alessio

system · November 27, 2019, 6:25pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash S3 Input - How does it know where to start? Logstash	2	1597	May 8, 2018
S3 Input Plugin Sincedb Time Logstash	1	1153	May 22, 2019
Multiple Logstash Docker containers sharing an S3 input Logstash	2	2508	July 6, 2017
Logstash repeatedly insert same data from corrupted S3 bucket file Logstash	2	634	July 28, 2017
Logstash sometimes skipping files on S3 Logstash	1	584	October 29, 2018

Duplicates entries when using S3 input

Related topics