Distinct count with filter

winder · November 7, 2018, 9:03pm

Hoping to get some guidance here.

I'm trying to correlate session-id's from two different events, a heartbeat sent every 10 minutes and a disconnect which could be sent any time. The goal is to get the number of active sessions for the last 10 minutes in a kibana visualization.

I don't think this is possible with the raw events in Elasticsearch, is that correct?

Would a logstash pipeline be the general approach here?

It seems like I should be able to use something like this Aggregate Filter example, using the heartbeat to add session-ids to a periodic "active sessions" event and the shutdown to remove the session-id. Does this seem like a reasonable approach or is there something simpler?

If this is the approach, can I initialize the next "active sessions" array from the most recent active-sessions event?

Mark_Harwood · November 9, 2018, 10:34am

As you indicate, Logstash has some features to join related documents in the ingest stream.

Another general approach is to land the events in the index first and then use a job to periodically (every few seconds?) update a separate "session" index with the latest recorded activities in the event index.
See: "entity-centric indexing".

winder · November 9, 2018, 2:49pm

Thanks Mark, this entity-centric indexing technique looks like exactly what I'm aiming for!

I'm still curious if such an index could be created in logstash, rather than introducing an extra set of scripts. It looks like it might be possible to recreate with the following:

Group events together and apply a timeout to make sure they are updated at the required interval: like this example.
Using an update script with the ES Output Plugin appears to be supported.

I'll probably run my tests using your ESEntityCentricIndexing script and see if it can be rolled into Logstash for production.

For now I'll mark this as solved and create any further questions in the Logstash section

Mark_Harwood · November 9, 2018, 3:05pm

I'm not a logstash expert but I'd suggest checking what happens when you use a system that relies on joining things together using a transient blob of memory. The questions I would have are:

Do I have to route all related events through the same logstash process?
Does the memory grow endlessly waiting for "start" events to match their equivalent "end" event?
How much memory do I need to hold a window big enough to tally all starts with ends?
What happens to in-flight sessions when the power goes off?

system · December 7, 2018, 3:19pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Durations and counts between events Kibana	5	314	June 20, 2019
SIEM Event correlation with Elasticsearch and Logstash Elasticsearch elastic-stack-security	4	4630	March 14, 2019
Entity-centric indexing with Transforms Elasticsearch transforms	5	1405	August 4, 2021
Aggregation of count of terms (possibly...) Elasticsearch	1	414	July 6, 2017
Updating indexed documents in Elasticsearch via Logstash Elasticsearch	1	490	July 5, 2017

Distinct count with filter

Related topics