I have a question regarding concurrency management in Logstash.
My setup is the following. I have two Logstash nodes that run the same pipeline, i.e. they use the same pipeline configuration.
They perform exactly the same scheduled operations: they get their input from a common source, make the same processing and then output the data to the same Elasticsearch cluster.
Since a Logstash instance always works in a standalone mode, I would like to avoid that the pipeline is executing the same activity at the same time by both instances.
I was thinking of solving this by implementing a "semaphore" index on Elasticsearch.
The semaphore implemention would work as follows.
In the input section of the Logstash pipeline, I query the semaphore index in ES for any document, with a certain periodicity (by using the elasticsearch input plugin).
If the query returns any documents, then the pipeline drops the current event.
If the query returns nothing, then the pipeline does the following in the filter section:
- creates a semaphore document in the semaphore index
- does the usual elaboration by enriching the input event
- after all activities have been done, remove the semaphore document
Then, it outputs the event to Elasticsearch.
This solution seems a bit cumbersome. Do you know if there is a simpler way to do it?