I have 2 outputs from a Logstash instance, one output to Elasticsearch and one output to a Kafka topic that is consumed by a homegrown ms. In practice, I have seen instances where the ms consuming the Kafka topic receives a record before it was completely committed to Elasticsearch, and a search fails.
I would like to enforce the requirement that the output to Elasticsearch is fully complete and committed before sending to the Kafka topic. How can I do this?
pipeline.workers - This defaults to the number of the host's CPU cores.
I think this should solve the issue:
pipeline.workers : 1
You have no control over the order in which outputs process events. Once the batch gets to the end of the pipeline it gets sent to the outputs for them to process at their leisure.
You could send data to the kafka output viâ a separate pipeline with a sleep filter in it, but that does not provide very fine control, and still does not guarantee the result you want.
Badger, shouldn't a single process do execution from input to output, line by line?
Does parallelism makes mix data processing, out of order?
As far as I know input threads and output threads are independent of the pipeline. Setting pipeline.workers to 1 and pipeline.ordered to true will make sure that events are kept in order within the filters, but I have never seen anything that suggests either inputs or outputs use any synchronization between threads.
pipeline.ordered had to be added because re-ordering events within the pipeline broke documented behaviour of the aggregate plugin. I do not think there is any documentation about how threads in the inputs and outputs behave.
I already had similar problem, explained on this topic. Only set
pipeline.workers: 1 , not extra set pipeline.ordered, so it's "auto".
It's just matter how to output push data in the same order as receive in input.
I1,I2,I3 => O1,O2,O3
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.