Elastic Stack for workflow tracing architecture

I'm relatively new to the Elastic world and am currently working on a project to track the amount of time it takes to complete various workflows in our system. The typical workflow will be initiated from a user interaction but then may traverse several systems via APIs before being completed. Each workflow has a trace which includes a correlation ID so that the series of output events the workflow generates are related. The events also have time data so their total duration can be tracked. This data is flowing into Elastic by way of a FileBeat -> LogStash -> Elastic flow.

Somewhere within the stack I want to aggregate the data so that I can measure the average time a flow takes in a given window of time. Because of the volume of data generated I thought it would be best to do this either in logstash by storing an index of the event data but also a separate index of the timings for the event with the arrival of each new event updating a document in the index (via correlation ID). I also looked at roll up jobs in Elastic but it doesn't look like they could perform the time difference operations across documents.

I'm looking for information on a high-level architecture for this type of system so that I can determine which parts of the stack to focus on without reinventing the wheel.

Thanks,

Ben

Hi Ben,

There's an approach using logstash and an "elapsed" plugin which can join related records together. We got into some of the architectural concerns over that here

Another approach is to use an entity centric index -example scripts and discussion here

Thanks for the info Mark. The video is great, I appreciate that. I would like to produce the entity-centric index using only tooling from the stack if possible. My idea right now is to use a logstash pipeline to clone the original event document, prune out all of the data except the trace ID and the event timestamp (and possibly any event description), mutate the timestamp into unix time and add it to an array, then submit it to ElasticSearch as an upsert. This would give me a separate index of documents that are centered around the trace ID (the entity) and contains an array of all the event times.

At this point I would need a mechanism to calculate the total time from the earliest and latest trace times. Is this possible to perform within elastic search with something akin to a roll up job? The end result is that I would like some calculations on the total time (avg, max, min etc) to be easily queried from Kibana while minimizing the work of the calculation at query-time as much as possible. Any other pointers within the elastic framework I could use without turning to external code somewhere in this chain?

Thanks for the help and insight,

Ben

1 Like

I've written a few update scripts like the one used in the video and found that I often performed the same logic -

  • check if first-sighted date on entity is not null
  • check if new event date is earlier than first-sighted date held on entity
  • Do the same for last-sighted date
  • subtract first-sighting time in millis from last-sighting time in millis.

This got a little repetitive and verbose to code so I thought maybe there's a generic incremental upsert script I could register and just use that for all my entity upserts, controlling it by passing parameters? A result of that experiment is here: Incremental indexing script V2 - multi pass · GitHub

Note that the params part of the example update includes a mix of the data which is the latest batch of events to add followed by various "commands" which are interpreted by the script to do things like date diffs.

The script linked above is rudimentary but you may find it useful. The thought process behind its development is laid out here: Incremental update tooling · Issue #40002 · elastic/elasticsearch · GitHub

Mark, with the guidance you've provided I've been able to get the system working in prototype form as a combination of logstash pipeline and elasticsearch update script. Logstash collects the discrete trace events and mutates each event for entry into a trace-oriented document in ElasticSearch. Just wanted to thank you for your help and input on this and let you know it's appreciated.

1 Like

Good to know, Ben!
Glad you got it going and thanks for the feedback

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.