Kibana - Count similar entries within a time interval of each other

I'm trying to analyse some IDS data and I'm having trouble understanding how to create a visualization that suits what I'm trying to see.

Basically I'm trying to find the amount of shared entries two IDS have between them. For example, I have entries from Bro and Suricata, and would like to see how many entries they share that have the same src and dst ip and are within 2 minutes of each other. From what I've looked around this doesn't seem to be possible, but I've just started using Kibana so I was hoping someone could point in the right direction.

What would be even better is, if it's possible, to arrange all of the different IDS's in a 2x2 matrix and have all these counts for each IDS pair, all on the same table.

Thanks.

Hi Pedro,

unfortunately what you are looking for is not achievable with Kibana at the moment.

Also that time dimension makes it really hard to even do this directly against Elasticsearch.

You would need to have a field, that contains both source and destination ip concatenated (which shouldn't be a problem to create). If you now do a terms aggregation over that field, you would get all entries, that share the same dst and src IP. To check whether they are really from two different entities, you could nest a metrics "Unique count" aggregation on the field containing "Bro" and "Suricata". If you filtered beforehand on "Bro" and "Suricata" you now know, that the source-dest-ip buckets, that have a unique count of 2 have documents with both values, meaning those are the intersection you are looking for.

Unfortunately adding this time constraint makes it very hard. You could e.g. create an additional time bucket aggregation with intervals of 2 minutes, but that is not exactly working like you wish I assume. Because this would now group together everything from minute 0 to minute 2, from 2 to 4, etc. Documents from minute 1 and 3 - which would still be within a 2 minute interval - wouldn't be in the same bucket anymore, meaning you don't have a "floating" 2 minute interval, but a fixed interval rounded to 2 minutes...

So I am not aware of a way, that you can achieve that exact behavior, and unfortunately you cannot do this in Kibana at the moment.

Cheers,
Tim

HI Pedro,

putting aside the visualization part of the problem for now, what is the number of unique IP pairs you would expect to find per 2 minute window in these sorts of logs?

Thank you for confirming my suspicion, and for your ideas. I created a visualization like timroes specified, and even without the time constraint it has been useful, as it appears there were no intersections between the IDS in my data. (Unique counts were always at 1). However, I'm only using a subset of the data, so hopefully when I scale the scope to encompass everything, it could at least show signs of overlapping alerts, even if it's not very detailed.

As for Mark_Harwood's question, I've made another visualization to find the value and it seems to me that it varies between 50 to 200 unique pairs every 2 minute intervals.

So depending on the length of time your analysis is covering that could be a lot of aggregation "buckets" to return looking for coincidences. Given a week of data, 2 min windows and 200 unique IP pairs per window that's over a million buckets:

 (7 * 24 * 60) /2 * 200 = 1,008,000 buckets

That's potentially a lot to try return in one JSON response and then look for buckets with > 1 system per bucket.

If your analysis window is smaller (say the last 10 minutes only) then this is perhaps less of a concern.

Generally speaking, performing behavioural analysis on many actors each of which produce many events is challenging for any distributed system if you scatter each actor's related data on different machines. Joins becomes too expensive. The speed and simplicity of any query-time analysis is improved if you can organise events more carefully at index-time. This is where entity-centric indexing [1] can help. You can maintain an index alongside your usual time-based event store which holds summaries for each actor noun e.g an ip-address or maybe in your case a communicating pair. It's easier to write logic that detects behavioural anomalies of an actor if all the related data for an actor is brought to the same place.

Visualization of anomalies is perhaps the simpler issue to deal with here and it is the filtering and identification of anomalies up front given large numbers of actors that is the main challenge.

[1] Why and how of entity-centric indexing: https://www.youtube.com/watch?v=yBf7oeJKH2Y

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.