I wanted to confirm a behavior which would determine my ability to automate provisioning Logstash for a varying number of data sources. I currently set up some automation through Ansible which splits a large number of data sources into batches equivalent to the number of provisioned machines. In doing this I've encountered a scenario where due to my batching when adding sources to my set will result in pipelines landing in different servers. For these pipelines I am using a combination of Kafka-Input and S3-Output.
Having persistent queues on the machines and this combination of Kafka input and s3 output I'd ideally want to be able to re-run my automation and regardless of whether the pipelines land in the same or different servers have the Kafka offsets be accurate.
For the above problem I'd need clarity on when and how I can manage offset committing from Logstash to Kafka. Do the offset updates occur after the data has been shipped through the output plugin or directly when they get pulled and then added to the persistent queue?
Is there any recommendation on how I could go about achieving a dynamic provisioning scheme? I'd rather not have to explicitly map each source to a node.