I'm having some trouble with the logstash aggregate filter.
We are trying to combine sflow events together. These events are not sent to logstash in real time. The sensor data is collected and sent to us in a batch on a daily basis. We run a script on these batches to pull the data into RabbitMQ. This is where logstash begins.
Because sflow is made up of disparate events, there is no definitive start or stop. We are using the fingerprint filter to combine 5 values in the sflow data to mark the aggregation. So, if these five fields match, we want to consider this part of the same event, within a certain timeout.
The relevant fields to support this seem to be timeout, inactivity_timeout, and timeout_timestamp_field. We have a field that marks the start of an sflow event, which we use for timeout_timestamp_field. It is converted from a unix timestamp to a logstash date.
We set the inactivity_timeout to 330 (5.5 minutes) and the timeout to 86400 (1 day).
So, if I have 5 events pulled into logstash, and the start time of each event is less than the 5.5 minutes after the start time of the previous event, I think this should all be aggregated into a single map.
However, what seems to be happening is that it marks the start time of the first event, and when 5.5 minutes from that time passes, it triggers the inactivity_timeout and completes the map.
Am I misunderstanding the inactivity_timeout? Is it not supposed to behave like a rolling 5.5 minute window until we don't see another event in that time period, or until the larger 1 day timeout happens?
Also, since these are logs from times past, how does the timeout work when all the events for the day have been processed? It seems like it works on a real time inactivity_timeout value, which would mean inactivity_timeout actually has two meanings, real time, and event time.
Thanks.
If it matters, OS is CentOS 7 (7.6 1810), Java is OpenJDK 8, LogStash is 7.30.