Question about filter effects on event handling order

I am trying to make a config that does a batch of indexing, then an elasticsearch filter query to find documents that need an update to be marked as stale (via a new event created in a subsequent ruby filter). It's important that my filter runs after all indexing is complete.

What I have found is that applying a filter (any filter, it seems) disrupts the expected order the events are processed. At least, the order they are output is disrupted. A simple config that demonstrates the behavior is the following:

input {
    generator {
        count => 1
        lines => ["1", "2"]
    }
}

filter {
    if [message] == "2" {
        mutate {
            add_field => { "foo" => "blah" }
        }
    }
}

output {
    stdout { codec => rubydebug }
}

With the filter active, the event for the second line (the only one the filter is applied to) is output first. With the filter disabled, the events are output in the expected order. Can anyone explain what is going on?

This is one of the effects of javafication. Events can get re-ordered even when you use a single pipeline worker. You could get the old behaviour using

pipeline.java_execution: false

but I would expect that workaround to disappear in a future release.

Over on github I have seen explicit statements from elastic developers that logstash does not guarantee event order, so I don't think they will accept that this is a bug.

does that mean that the aggregate filter can't be trusted either?

That's a really good question, one which I have been wondering about for a while.

Excellent! I was going to need that filter as well for this :upside_down_face:

Thank you for clarifying this, though. It's better that I know now than find out later.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.