I am trying to make a config that does a batch of indexing, then an elasticsearch filter query to find documents that need an update to be marked as stale (via a new event created in a subsequent ruby filter). It's important that my filter runs after all indexing is complete.
What I have found is that applying a filter (any filter, it seems) disrupts the expected order the events are processed. At least, the order they are output is disrupted. A simple config that demonstrates the behavior is the following:
With the filter active, the event for the second line (the only one the filter is applied to) is output first. With the filter disabled, the events are output in the expected order. Can anyone explain what is going on?
This is one of the effects of javafication. Events can get re-ordered even when you use a single pipeline worker. You could get the old behaviour using
pipeline.java_execution: false
but I would expect that workaround to disappear in a future release.
Over on github I have seen explicit statements from elastic developers that logstash does not guarantee event order, so I don't think they will accept that this is a bug.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.