Distinct count with filter

Hoping to get some guidance here.

I'm trying to correlate session-id's from two different events, a heartbeat sent every 10 minutes and a disconnect which could be sent any time. The goal is to get the number of active sessions for the last 10 minutes in a kibana visualization.

I don't think this is possible with the raw events in Elasticsearch, is that correct?

Would a logstash pipeline be the general approach here?

It seems like I should be able to use something like this Aggregate Filter example, using the heartbeat to add session-ids to a periodic "active sessions" event and the shutdown to remove the session-id. Does this seem like a reasonable approach or is there something simpler?

If this is the approach, can I initialize the next "active sessions" array from the most recent active-sessions event?

As you indicate, Logstash has some features to join related documents in the ingest stream.

Another general approach is to land the events in the index first and then use a job to periodically (every few seconds?) update a separate "session" index with the latest recorded activities in the event index.
See: "entity-centric indexing".

1 Like

Thanks Mark, this entity-centric indexing technique looks like exactly what I'm aiming for!

I'm still curious if such an index could be created in logstash, rather than introducing an extra set of scripts. It looks like it might be possible to recreate with the following:

  1. Group events together and apply a timeout to make sure they are updated at the required interval: like this example.
  2. Using an update script with the ES Output Plugin appears to be supported.

I'll probably run my tests using your ESEntityCentricIndexing script and see if it can be rolled into Logstash for production.

For now I'll mark this as solved and create any further questions in the Logstash section

I'm not a logstash expert but I'd suggest checking what happens when you use a system that relies on joining things together using a transient blob of memory. The questions I would have are:

  1. Do I have to route all related events through the same logstash process?
  2. Does the memory grow endlessly waiting for "start" events to match their equivalent "end" event?
  3. How much memory do I need to hold a window big enough to tally all starts with ends?
  4. What happens to in-flight sessions when the power goes off?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.