Aggregate filter in Logstash not behaving correctly

avaishnav · March 2, 2020, 7:51pm

I have a use case where application is writing data into log file. Filebeat sends log to Logstash, Logstash aggregates data and sends aggregated data to Elasticsearch to store it. Logstash drops the original data and sends only aggregated data to Elasticsearch.

Following is Logstash filter configuration:
filter {
json {
source => "message"
}

  if [logid] == "mylog" {
    if [event-type] == "start" {
      aggregate {
        task_id => "%{[key]}"
        code => "
            map['total-wasted-duration'] = 0.0
            map['log-count'] = 0 
        "
        map_action => "create"
      }
      drop {}
    }
    if [event-type] == "log" {
      aggregate {
        task_id => "%{[key]}"
        code => "
            map['total-wasted-duration'] += event.get('wasted-duration')
            map['log-count'] += 1 
        "
        map_action => "update"
      }
      drop {}
    } 
    if [event-type] == "end" {
      aggregate {
        task_id => "%{[key]}"
        code => "
            event.set('total-wasted-duration', map['total-wasted-duration'])
            event.set('log-count', map['log-count']) 
        "
        map_action => "update"
        end_of_task => true
        timeout => 60
      }
    }
  }
}

In above configuration, Logstash parses log into json object. If “logid” property in json object is “mylog”, it aggregates “wasted-duration” properly. I am using “event-type” property to manage map to store aggregated data. Calculated “total-wasted-duration” is copied from map to event at the end. I also stord “log-count” in map and increase it as messages are aggregated to count number of aggregated messages in aggregate filter.
I have thousands of messages to process and aggregate its “wasted-duration”. In my sample test, I have 100,000 log messages:

Log file sample (it has start event, multiple log event and end event):
{"@timestamp":"2020-03-02T10:02:38.7359198-05:00","logid":"mylog","event-type":"start","key":"Database1-100"}
{"wasted-duration":1.1,"@timestamp":"2020-03-02T10:02:39.1849294-05:00","logid":"mylog","event-type":"log","key":"Database1-100"}
{"wasted-duration":1.2,"@timestamp":"2020-03-02T10:02:39.1849294-05:00","logid":"mylog","event-type":"log","key":"Database1-100"}
...
...
{"@timestamp":"2020-03-02T10:03:45.4869082-05:00","logid":"mylog","event-type":"end","key":"Database1-100"}

I see “total-wasted-duration” calculated incorrectly most of time (it is not calculated at all sometimes); and when it is calculated incorrectly, “log-count” is also incorrect. It almost seems like aggregate filter is missing processing some of log messages. It seems like either aggregate filter or its configuration is not working as expected. But I am unable to figure out what is causing this behavior.

Any help is appreciated.

Thank you.

system · March 30, 2020, 7:52pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash aggregation filter not working properly Logstash	3	1532	July 6, 2017
Aggregate filter does not always work Logstash	2	758	July 26, 2017
Logstash Aggregate plugin don´t aggregate Logstash	3	765	January 8, 2018
Can't make the "aggregate" filter to work Logstash	1	357	July 6, 2017
Logstash aggregate filter not working Logstash	2	955	January 11, 2017

Aggregate filter in Logstash not behaving correctly

Related topics