What is the best solution for one of the basic requirements in log's analisis. "CALCULATE DURATION"

The problem is to calculate the duration between 2 events. For example:

Logs lines:

15-04-2016T10:00:00:000+UTC 1 start data
15-04-2016T10:00:00:001+UTC 2 start data
15-04-2016T10:00:00:004+UTC 2 end data
15-04-2016T10:00:00:005+UTC 1 end data

In that case, it's necessary to add a field called "Duration" on the "end events" and assign the subtract the time.

15-04-2016T10:00:00:000+UTC 1 start data 0
15-04-2016T10:00:00:001+UTC 2 start data 0
15-04-2016T10:00:00:004+UTC 2 end data 3
15-04-2016T10:00:00:005+UTC 1 end data 5

Thank you to all.

sorry i'm not sure to understand your question.

Just calculate the time between 2 event.

The events has an ID to correlate them and a flag like start, step1, setp2, end.

I wanna know the duration from the start to step1, from start to step2, and total duration.. from start to end.

Thank you

Maybe this is the right filter :

1 Like

That's the solution if we want to calculte from the timestamp. The problem is that timestamp is assign by Logstash when an event comes. If we don't have a real time processing this isn't work. For that reason suppose an scenario like that.

Logstash Processing Time ..... ID ........ TAG ........ Real time for the event

15-04-2016T10:00:00:000+UTC.... 1..... start ........ 15-04-2016T9:00:30:000+UTC
15-04-2016T10:00:00:005+UTC.... 1 ..... end ....... 15-04-2016T9:00:30:020+UTC

If we used Elapsed plugin the duration will be 5 miliseconds, but the real duration should be calculate by Real Time field, in that case the duration will be 20 miliseconds.

How can we do in that case???


You should use the aggregate if you follow tis example


aggregate - Elastic
The aim of this filter is to aggregate information available among several events (typically log lines) belonging to a same task, and finally push aggregated ...

it should be fine by subtracting the time no ?

@Rubytor You can overwrite the Logstash Processing Time with the real time for the event. Use the date filter for this purpose:

@Gnosis When you use the aggregate-filter you must set filter workers to 1. This isn't really nice.

You should be very careful to set logstash filter workers to 1 (-w 1 flag) for this filter to work correctly otherwise documents may be processed out of sequence and unexpected results will occur.

Ok, thx a lot and how do you aggregate data without the aggregate-filter plz ?

Maybe your shipper can do this. I depends on your usecase and infrastructure. Image you have two logstash-instances with a load-balancer in front of them. How you would ensure that all Events flow to the same LS-Instance?

What about Filebeat?? I think it has it's own load balancer isn't it??

Up to me, the right solution for your need is to use 'date' filter and then 'elapsed' filter.

  • date filter allows you to put your message date (ex: 15-04-2016T10:00:00:000+UTC) in @timestamp field.
  • then elapsed filter will compute the elapsed time between start event and end event (using @timestamp field) and will store duration in 'elapsed.time' field in end event.

But you have to know one thing : computed duration is in seconds. If you want a more precise duration (in milliseconds for example), you will have to use aggregate filter.
In all cases, you must first use 'date' filter to set message date in @timestamp field.

Here is the logstash configuration using aggregate filter :

date {
     match => [ "_timestamp", "dd-MM-yyyy'T'HH:mm:ss:SSSZ" ]

if [start] {
        aggregate {
             task_id => "%{taskid}"
            map_action => "create"
             code => "map['start_timestamp'] = event['@timestamp']"
if [end] {
        aggregate {
             task_id => "%{taskid}"
            map_action => "update"
             code => "event['duration'] = event['@timestamp'] - map['start_timestamp']"
            end_of_task => true

Hope it helps.

1 Like