I have 2 outputs from a Logstash instance, one output to Elasticsearch and one output to a Kafka topic that is consumed by a homegrown ms. In practice, I have seen instances where the ms consuming the Kafka topic receives a record before it was completely committed to Elasticsearch, and a search fails.
I would like to enforce the requirement that the output to Elasticsearch is fully complete and committed before sending to the Kafka topic. How can I do this?
You have no control over the order in which outputs process events. Once the batch gets to the end of the pipeline it gets sent to the outputs for them to process at their leisure.
You could send data to the kafka output viâ a separate pipeline with a sleep filter in it, but that does not provide very fine control, and still does not guarantee the result you want.
As far as I know input threads and output threads are independent of the pipeline. Setting pipeline.workers to 1 and pipeline.ordered to true will make sure that events are kept in order within the filters, but I have never seen anything that suggests either inputs or outputs use any synchronization between threads.
pipeline.ordered had to be added because re-ordering events within the pipeline broke documented behaviour of the aggregate plugin. I do not think there is any documentation about how threads in the inputs and outputs behave.
I already had similar problem, explained on this topic. Only set pipeline.workers: 1 , not extra set pipeline.ordered, so it's "auto".
It's just matter how to output push data in the same order as receive in input.
I1,I2,I3 => O1,O2,O3
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.