A question around logstash S3 input plugin

Hi All,

We run logstash on multiple EC2 instances behind a loadbalancer for reliability purposes. We are thinking of using the S3 input plugin. Since the servers are created by auto-scaling process of AWS, they are exactly same.

I am trying to get some clarity around the behaviour of the S3 input plugin when multiple instances of Logstash are running polling the same S3 bucket and prefix.

My understanding is that they should be fine since each S3 object is key looks like a filepath but is not actually a filepath.

Sample code for my case will be like this. I delete the S3 object after reading it. So no need to track the last handled file.

input
{
	s3
	{
		bucket => "testbucket"
		prefix => "get/this/data"
		region => "us-east-2"
		delete => true
		interval => 100
		sincedb_path => "/dev/null"
		additional_settings => {
			"force_path_style" => true
			"follow_redirects" => false
			}
	}
}

This input does not support this, it can lead to duplicates as you cannot guarantee that multiple Logstash instances will not try to read the same object in S3 at the same type.

If you need to have multiple instances reading the same bucket you should something that support it, one option is to use Filebeat with the AWS S3 Input with SQS configured, this is also the recommend way to consume logs from S3 buckets.

Thanks. Makes things a lot clearer. I will go through the link you posted.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.