Multiple Logstash Docker containers sharing an S3 input

Paul_Kavanagh · December 1, 2015, 4:53pm

This is one of those "I can't be the first one" kinda questions.

We're running Logstash server in Docker. We run anywhere between 3 and 6, depending on the logging volume. Our current input is just a Redis box (Elasticache, in reality), and we output to an Elasticsearch cluster.

It all works great, and using Marathon we can scale up and down very easily.

We are now investigating also consuming some AWS generated logging. In particular, the ELB logs, and Cloudtrail logs. These logs are generated by AWS, and placed into an S3 bucket for us.

We'd like to have that S3 bucket as an input for Logstash. But, looking at the S3 Input, it seems each Logstash instance would create it's own sincedb, and then ingest from the last known point recorded in there.

This has two problems for us:

Firstly, the sincedb file entails state, and our Logstash containers are stateless.
Secondly, the various instances of Logstash aren't visible to each each other, so we'll get the same data read and passed to ES multiple times.

I'm curious why I can't any previous examples of people trying this, because this feels like it should be a pretty common use-case nowadays.

Has anyone else attempted this before? Are we on the totally wrong track here? Any insights would be much appreciated!

magnusbaeck · December 1, 2015, 6:01pm

Logstash currently has no features for sharing state or otherwise synchronizing actions between instances, including sincedb state for file or s3 inputs. So, no matter what you do you can't have two instances pulling from the same file in S3.

If your containers really need to be stateless I suppose you could run another process inside the container that periodically pushes the sincedb file to S3 or some other Amazon data store. Then, modify the container's startup script to pull the same resource before starting Logstash so that you can kill and restart the container and not lose the persisted state.

Topic		Replies	Views
How to run multiple logstash instances for s3 input Logstash	7	5163	July 6, 2017
Logstash failover s3 Logstash	4	766	January 17, 2017
Logstash: s3 input plugin Logstash	2	550	May 3, 2018
Duplicates entries when using S3 input Logstash docker	1	565	November 27, 2019
How is the state (last read file or position in a file) is maintained for multiple pods running logstash Logstash docker	5	303	October 13, 2022

Multiple Logstash Docker containers sharing an S3 input

Related topics