Aggregate filter plugin - timeout function bug

MarcelOs · October 11, 2021, 3:21pm

Hello everyone,
I have some problems with the Timeout command from the aggregate filter plugin. I want to create a script that calculates the meantime between two log entries. There are examples of such a function (https://www.elastic.co/guide/en/logstash/current/plugins-filter-aggregate.html#Plugins-filter-Aggregate-Example1). That has worked as far as possible.

However, in order to save resources, I would like to use a timeout so that long intermediate times are canceled and -1 is given back. In most cases that works too. However, there is partly the situation where the Timeout command did not work.

In the following I have an exemplary script for displaying the error:

filter {
    #match My TestInputs
    grok {
        match => [ "message", "%{NOTSPACE:id} - %{NOTSPACE:status}" ]
    }


	# the Startpoint
    if ([status] == "start")
    {
        aggregate {
            task_id => "%{id}"
            code => "map['duartion_map'] = (event.get('@timestamp').to_f*1000).to_i"
            map_action => "create"
			timeout => 1
        }
    }


	# the Endpoint
    if [status] == "end"
    {
        #Fill the field duration_field, so that is filled every time, even if the time difference is over the timeout
        mutate{
            add_field => {duration_field => "-1"}
        }
        aggregate 
		{
			task_id => "%{id}"
            code => "event.set('duration_field', (event.get('@timestamp').to_f*1000).to_i - map['duartion_map'])"
			map_action => "update"
            end_of_task => true
        }
    }
}

I have entered a successive of the following records, for testing:
1 - Start.
1 - end
2 - Start.
2 - end

However, when the script is issued, you can see that the timeout command has not worked for the second time, although nothing has been changed in the source code, and the logstash test server was not restarted. This problem can not be precisely reproduced and randomly occurs. I already had console outputs, which were about 1.5 seconds apart with the timeout worked.

Does anyone know the problem and knows an idea that I can try to solve the problem?

Kind regards

Badger · October 11, 2021, 4:59pm

The timeout is implemented using the flush method, which is called by logstash every five seconds, so if you set timeout to 1, the effective value will be anywhere from 1 to 5 seconds.

MarcelOs · October 27, 2021, 1:03pm

Thank you badger. Unfortunately, I realized that this wasn't my solution. I tried to set the timeout to 10 seconds and what can I say. Sometimes it works and sometimes logstash just ignores my timeout.

For more information see code and log below:

filter {
    #match My TestInputs
    grok {
        match => [ "message", "%{NOTSPACE:id} - %{NOTSPACE:status}" ]
    }


	# the Startpoint
    if ([status] == "start")
    {
        aggregate {
            task_id => "%{id}"
            code => "map['duartion_map'] = (event.get('@timestamp').to_f*1000).to_i"
            map_action => "create"
			timeout => 10
        }
    }


	# the Endpoint
    if [status] == "end"
    {
        #Fill the field duration_field, so that is filled every time, even if the time difference is over the timeout
        mutate{
            add_field => {duration_field => "-1"}
        }
        aggregate 
		{
			task_id => "%{id}"
            code => "event.set('duration_field', (event.get('@timestamp').to_f*1000).to_i - map['duartion_map'])"
			map_action => "update"
            end_of_task => true
        }
    }
}

And my output looks like

What went wrong? Any idea?

Badger · October 27, 2021, 3:41pm

Once again, the timeout is evaluated by a function that is called every five seconds. So a ten second timeout will actually be somewhere between 10 and 15 seconds.

MarcelOs · October 28, 2021, 6:59am

Ok thanks I understand. Then an example from another system. In this case, sometimes it happens that the timeout command does not work.

Usually the timeout is set to 120 seconds. If this is exceeded, the "edi_basic_latency" field is set to 120000. This seems to work so far as the most common value by far is 120000, as shown below.
Top values

In the following example you can now see that there are some records that are well over 120 seconds long. In this case at 180 seconds!

So what is the problem in this case? It is much more than 120seconds plus 5 seconds.

Badger · October 28, 2021, 5:29pm

If the aggregate timeout is set to 120 seconds I cannot think of a reason why it would take more than 125 seconds to fire.

MarcelOs · October 29, 2021, 7:22am

Maybe it is possible to change the update time so that the flush method doesn't call every 5 seconds? And if yes how can i check/change the update time from the flush method?

system · November 26, 2021, 7:22am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
[Logstash] Aggregate filter - no timeout Logstash	7	697	August 19, 2021
Can I write timeout_code in aggregate filter in a file and put the path in aggregate filter as parameter? Logstash	1	306	December 16, 2019
Aggregate Filter Timeout Logstash	1	729	July 6, 2017
Aggregate filter plugin session duration Logstash	14	1124	August 10, 2021
Aggregate filter plugin timeout weird behavior Logstash	3	722	February 12, 2019

Aggregate filter plugin - timeout function bug

Related topics