Aggregate filter in Logstash not behaving correctly

I have a use case where application is writing data into log file. Filebeat sends log to Logstash, Logstash aggregates data and sends aggregated data to Elasticsearch to store it. Logstash drops the original data and sends only aggregated data to Elasticsearch.

Following is Logstash filter configuration:
filter {
json {
source => "message"
}

  if [logid] == "mylog" {
    if [event-type] == "start" {
      aggregate {
        task_id => "%{[key]}"
        code => "
            map['total-wasted-duration'] = 0.0
            map['log-count'] = 0 
        "
        map_action => "create"
      }
      drop {}
    }
    if [event-type] == "log" {
      aggregate {
        task_id => "%{[key]}"
        code => "
            map['total-wasted-duration'] += event.get('wasted-duration')
            map['log-count'] += 1 
        "
        map_action => "update"
      }
      drop {}
    } 
    if [event-type] == "end" {
      aggregate {
        task_id => "%{[key]}"
        code => "
            event.set('total-wasted-duration', map['total-wasted-duration'])
            event.set('log-count', map['log-count']) 
        "
        map_action => "update"
        end_of_task => true
        timeout => 60
      }
    }
  }
}

In above configuration, Logstash parses log into json object. If “logid” property in json object is “mylog”, it aggregates “wasted-duration” properly. I am using “event-type” property to manage map to store aggregated data. Calculated “total-wasted-duration” is copied from map to event at the end. I also stord “log-count” in map and increase it as messages are aggregated to count number of aggregated messages in aggregate filter.
I have thousands of messages to process and aggregate its “wasted-duration”. In my sample test, I have 100,000 log messages:

Log file sample (it has start event, multiple log event and end event):
{"@timestamp":"2020-03-02T10:02:38.7359198-05:00","logid":"mylog","event-type":"start","key":"Database1-100"}
{"wasted-duration":1.1,"@timestamp":"2020-03-02T10:02:39.1849294-05:00","logid":"mylog","event-type":"log","key":"Database1-100"}
{"wasted-duration":1.2,"@timestamp":"2020-03-02T10:02:39.1849294-05:00","logid":"mylog","event-type":"log","key":"Database1-100"}
...
...
{"@timestamp":"2020-03-02T10:03:45.4869082-05:00","logid":"mylog","event-type":"end","key":"Database1-100"}

I see “total-wasted-duration” calculated incorrectly most of time (it is not calculated at all sometimes); and when it is calculated incorrectly, “log-count” is also incorrect. It almost seems like aggregate filter is missing processing some of log messages. It seems like either aggregate filter or its configuration is not working as expected. But I am unable to figure out what is causing this behavior.

Any help is appreciated.

Thank you.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.