Hi,
Can you please advise me setting up the following?
There are multiple log files (csv) coming from multiple sources.
The logfiles contains data from transactions which are first processed in system A, then processed in system B and so on.
In all systems there's the same unique identifier per transaction.
How can I calculate the difference between the end time given in the logfile from system A and the starttime of system B?
I've tried using Elastisearch filter in Logstash. That approach fails when logfile B is processed before logfile A.
Example (unique ID = 12)
LOGFILE A
ID timeA actionA
12 00:00:00 start processing
12 00:00:10 end processing < 'elapsed' gives me 10 seconds
LOGFILE B
ID timeB actionB
12 00:00:11 start processing
12 00:00:15 end processing < 'elapsed' gives me 4 seconds
With the plugin 'elapsed' I successfully calculate the time it took per system.
Is there a way I can calculate the time between 'end processing' system A (0:00:10) and the 'start processing' of system B (00:00:11) regardless of the order in which the files are processed?
Hi Peter,
Thanks for your quick reaction!
I've tried that solution: 'I've tried using Elastisearch filter in Logstash. That approach fails when logfile B is processed before logfile A.'
It only works if the files are delivered in the right order (which they often don't).
I am thinking a bit out of the box now. How soon do you need that data of the calculation?
What you can do is store the calculation in a new field, if it has no results during the lookup, you can you do an _update_by_query later on for the empty/missing calculated fields. Elasticsearch now also has an enrich/lookup processor to leverage.
Additional: For missing values if you need a value present if do not have it at that time you can index null_value with an average so you wont come blank.
Check out the transforms api as a framework for bringing related data together.
It can use aggregations relating to a common ID to fuse related data and then store the results. These aggregations can also use custom scripts to derive data like a duration
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.