Elastic Stack for workflow tracing architecture

ben.dixon · May 1, 2019, 1:12pm

I'm relatively new to the Elastic world and am currently working on a project to track the amount of time it takes to complete various workflows in our system. The typical workflow will be initiated from a user interaction but then may traverse several systems via APIs before being completed. Each workflow has a trace which includes a correlation ID so that the series of output events the workflow generates are related. The events also have time data so their total duration can be tracked. This data is flowing into Elastic by way of a FileBeat -> LogStash -> Elastic flow.

Somewhere within the stack I want to aggregate the data so that I can measure the average time a flow takes in a given window of time. Because of the volume of data generated I thought it would be best to do this either in logstash by storing an index of the event data but also a separate index of the timings for the event with the arrival of each new event updating a document in the index (via correlation ID). I also looked at roll up jobs in Elastic but it doesn't look like they could perform the time difference operations across documents.

I'm looking for information on a high-level architecture for this type of system so that I can determine which parts of the stack to focus on without reinventing the wheel.

Thanks,

Ben

Mark_Harwood · May 1, 2019, 2:34pm

Hi Ben,

There's an approach using logstash and an "elapsed" plugin which can join related records together. We got into some of the architectural concerns over that here

Another approach is to use an entity centric index -example scripts and discussion here

ben.dixon · May 2, 2019, 1:18pm

Thanks for the info Mark. The video is great, I appreciate that. I would like to produce the entity-centric index using only tooling from the stack if possible. My idea right now is to use a logstash pipeline to clone the original event document, prune out all of the data except the trace ID and the event timestamp (and possibly any event description), mutate the timestamp into unix time and add it to an array, then submit it to ElasticSearch as an upsert. This would give me a separate index of documents that are centered around the trace ID (the entity) and contains an array of all the event times.

At this point I would need a mechanism to calculate the total time from the earliest and latest trace times. Is this possible to perform within elastic search with something akin to a roll up job? The end result is that I would like some calculations on the total time (avg, max, min etc) to be easily queried from Kibana while minimizing the work of the calculation at query-time as much as possible. Any other pointers within the elastic framework I could use without turning to external code somewhere in this chain?

Thanks for the help and insight,

Ben

Mark_Harwood · May 2, 2019, 1:32pm

I've written a few update scripts like the one used in the video and found that I often performed the same logic -

check if first-sighted date on entity is not null
check if new event date is earlier than first-sighted date held on entity
Do the same for last-sighted date
subtract first-sighting time in millis from last-sighting time in millis.

This got a little repetitive and verbose to code so I thought maybe there's a generic incremental upsert script I could register and just use that for all my entity upserts, controlling it by passing parameters? A result of that experiment is here: Incremental indexing script V2 - multi pass · GitHub

Note that the params part of the example update includes a mix of the data which is the latest batch of events to add followed by various "commands" which are interpreted by the script to do things like date diffs.

The script linked above is rudimentary but you may find it useful. The thought process behind its development is laid out here: Incremental update tooling · Issue #40002 · elastic/elasticsearch · GitHub

ben.dixon · May 10, 2019, 2:07pm

Mark, with the guidance you've provided I've been able to get the system working in prototype form as a combination of logstash pipeline and elasticsearch update script. Logstash collects the discrete trace events and mutates each event for entry into a trace-oriented document in ElasticSearch. Just wanted to thank you for your help and input on this and let you know it's appreciated.

Mark_Harwood · May 10, 2019, 2:09pm

Good to know, Ben!
Glad you got it going and thanks for the feedback

system · June 7, 2019, 2:09pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logs Timestamp Kibana	3	3423	July 23, 2019
Entity-centric indexing with Transforms Elasticsearch transforms	5	1403	August 4, 2021
Proper way to index entity data from event logs Elasticsearch	1	332	March 28, 2020
Entity-Centric Indexing - reliability and performance Elasticsearch	16	3471	August 16, 2017
Calculate time difference between two different record timestamps Elasticsearch	3	3041	July 18, 2018

Elastic Stack for workflow tracing architecture

Related topics