This is one of those "I can't be the first one" kinda questions.
We're running Logstash server in Docker. We run anywhere between 3 and 6, depending on the logging volume. Our current input is just a Redis box (Elasticache, in reality), and we output to an Elasticsearch cluster.
It all works great, and using Marathon we can scale up and down very easily.
We are now investigating also consuming some AWS generated logging. In particular, the ELB logs, and Cloudtrail logs. These logs are generated by AWS, and placed into an S3 bucket for us.
We'd like to have that S3 bucket as an input for Logstash. But, looking at the S3 Input, it seems each Logstash instance would create it's own sincedb, and then ingest from the last known point recorded in there.
This has two problems for us:
- Firstly, the sincedb file entails state, and our Logstash containers are stateless.
- Secondly, the various instances of Logstash aren't visible to each each other, so we'll get the same data read and passed to ES multiple times.
I'm curious why I can't any previous examples of people trying this, because this feels like it should be a pretty common use-case nowadays.
Has anyone else attempted this before? Are we on the totally wrong track here? Any insights would be much appreciated!