Time between two timestamps in different files

Can you please advise me setting up the following?

There are multiple log files (csv) coming from multiple sources.
The logfiles contains data from transactions which are first processed in system A, then processed in system B and so on.
In all systems there's the same unique identifier per transaction.
How can I calculate the difference between the end time given in the logfile from system A and the starttime of system B?
I've tried using Elastisearch filter in Logstash. That approach fails when logfile B is processed before logfile A.

Example (unique ID = 12)

ID timeA actionA
12 00:00:00 start processing
12 00:00:10 end processing < 'elapsed' gives me 10 seconds

ID timeB actionB
12 00:00:11 start processing
12 00:00:15 end processing < 'elapsed' gives me 4 seconds

With the plugin 'elapsed' I successfully calculate the time it took per system.

Is there a way I can calculate the time between 'end processing' system A (0:00:10) and the 'start processing' of system B (00:00:11) regardless of the order in which the files are processed?

Help or advice is greatly appreciated!

Have you seen the docs on Lookup Enrichment:

The second code block maybe of good use to you.

Hi Peter,
Thanks for your quick reaction!
I've tried that solution: 'I've tried using Elastisearch filter in Logstash. That approach fails when logfile B is processed before logfile A.'
It only works if the files are delivered in the right order (which they often don't).

Any other suggestion?

I am thinking a bit out of the box now. How soon do you need that data of the calculation?
What you can do is store the calculation in a new field, if it has no results during the lookup, you can you do an _update_by_query later on for the empty/missing calculated fields. Elasticsearch now also has an enrich/lookup processor to leverage.

Additional: For missing values if you need a value present if do not have it at that time you can index null_value with an average so you wont come blank.

1 Like

Thanks Peter, that sounds like a good approach since I don't need the calculation on the fly.
I'm going to give it a try!

Check out the transforms api as a framework for bringing related data together.
It can use aggregations relating to a common ID to fuse related data and then store the results. These aggregations can also use custom scripts to derive data like a duration