Short version: can the aggregate filter be made to stop after it receives 2 events for a given task ID?
I'm using the aggregate
filter to successfully combine two very similar apache log lines, one of which contains a valid auth name and the other a duration (don't ask!)
The events arrive up to a minute apart, and while I'm loathe to miss any I have two problems with setting a large timeout. Firstly, I worry about having huge numbers of in-flight events and maps clogging up the aggregate filter and its single pipeline worker. Secondly, I'd rather not wait longer than I have to to see the events.
Ideally I'd like to set a timeout of around 300s, but have the map pushed immediately once two events have arrived. Is this possible?
In case it's of interest to anyone who stumbles across this post, my approach is to create a murmur3 fingerprint of all fields except duration
and auth
, then use that as task_id
.
fingerprint {
source => [ "all", "common", "fields" ]
concatenate_sources => true
method => "MURMUR3"
target => "[@metadata][aggregate_id]"
}
aggregate {
task_id => "[@metadata][aggregate_id]"
code => "
map['all'] ||= event.get('all')
map['common'] ||= event.get('common')
map['fields'] ||= event.get('fields')
map['duration'] ||= event.get('duration')
map['auth'] ||= event.get('auth')
event.cancel()
"
push_map_as_event_on_timeout => true
timeout => 90
}