Time between two timestamps in different files

Tommie · September 11, 2020, 1:10pm

Hi,
Can you please advise me setting up the following?

There are multiple log files (csv) coming from multiple sources.
The logfiles contains data from transactions which are first processed in system A, then processed in system B and so on.
In all systems there's the same unique identifier per transaction.
How can I calculate the difference between the end time given in the logfile from system A and the starttime of system B?
I've tried using Elastisearch filter in Logstash. That approach fails when logfile B is processed before logfile A.

Example (unique ID = 12)

LOGFILE A
ID timeA actionA
12 00:00:00 start processing
12 00:00:10 end processing < 'elapsed' gives me 10 seconds

LOGFILE B
ID timeB actionB
12 00:00:11 start processing
12 00:00:15 end processing < 'elapsed' gives me 4 seconds

With the plugin 'elapsed' I successfully calculate the time it took per system.

Is there a way I can calculate the time between 'end processing' system A (0:00:10) and the 'start processing' of system B (00:00:11) regardless of the order in which the files are processed?

Help or advice is greatly appreciated!

Peter_Steenbergen · September 11, 2020, 1:46pm

Have you seen the docs on Lookup Enrichment:
https://www.elastic.co/guide/en/logstash/current/lookup-enrichment.html#lookup-plugins

The second code block maybe of good use to you.

Tommie · September 11, 2020, 4:22pm

Hi Peter,
Thanks for your quick reaction!
I've tried that solution: 'I've tried using Elastisearch filter in Logstash. That approach fails when logfile B is processed before logfile A.'
It only works if the files are delivered in the right order (which they often don't).

Any other suggestion?

Peter_Steenbergen · September 11, 2020, 7:27pm

I am thinking a bit out of the box now. How soon do you need that data of the calculation?
What you can do is store the calculation in a new field, if it has no results during the lookup, you can you do an _update_by_query later on for the empty/missing calculated fields. Elasticsearch now also has an enrich/lookup processor to leverage.

Additional: For missing values if you need a value present if do not have it at that time you can index null_value with an average so you wont come blank.

Tommie · September 14, 2020, 9:28am

Thanks Peter, that sounds like a good approach since I don't need the calculation on the fly.
I'm going to give it a try!

Mark_Harwood · September 14, 2020, 11:09am

Check out the transforms api as a framework for bringing related data together.
It can use aggregations relating to a common ID to fuse related data and then store the results. These aggregations can also use custom scripts to derive data like a duration

system · October 12, 2020, 11:09am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Compute difference between 2 timestamps Logstash	3	2611	July 12, 2017
Elapsed time calculation between different events Elasticsearch	2	3628	July 9, 2019
Time between timestamp Logstash	7	3448	July 6, 2017
Comparing date/time from two different documents Kibana	9	3375	June 5, 2018
Finding time difference between two events with in a single batch of logstash Elasticsearch	6	1012	October 7, 2019

Time between two timestamps in different files

Related topics