Aggregating Source IP and Destination IP pairs on Firewall logs

Our SIEM collects logs from many firewall and stores in a Elasticsearch 5.6 repository. We want to retrieve data for Source and destination IP communication on certain ports. The idea is to find the top communicators and try to find out how many times each pair communicates on the list of ports. Following is my data record and for 24 hours there are about 8 million records.

SourceIP = OriginIP
DestinationIP = ImpactedIP
Port = ImpactedPort


msgSourceTypeName : Syslog - Cisco ASA
impactedPort : 59077
commonEventName : Translation Teardown
normalDate : 2019-06-16T21:57:41.757Z
impactedIp : 88.77.63.99
logSourceName : 192.168.135.21 Cisco ASA
directionName : Outbound
originIp : 192.168.100.24

msgSourceTypeName : Syslog - Cisco ASA
impactedPort : 80
commonEventName : Traffic Denied by Network Firewall
normalDate : 2019-06-16T21:57:42.783Z
impactedIp : 65.44.123.214
logSourceName : 192.168.135.92 Cisco ASA
directionName : Outbound
originIp : 10.162.31.166

msgSourceTypeName : Syslog - Cisco ASA
impactedPort : 443
commonEventName : Connection Teardown
normalDate : 2019-06-16T21:57:45.886Z
impactedIp : 212.234.123.232
logSourceName : 192.168.135.21 Cisco ASA
directionName : Outbound
originIp : 192.168.100.24

I'm using Java REST API to connect to my Elasticsearch and retrieve data and manages to use scroll API to go through all 8 million records for 24 hours but it takes hours to display. In this scenario it's not viable for me to go through the entire search response and do the aggregation by the Java program.

Is there anyway I can make my query to aggregate the results the way I want, that is

OriginIP : xx.xx.xx.xx
ImpactedIP: yy.yy.yy.yy
ImpactedPort : 443
Count : 999
impactedPort : 80
Count : 999
ImpactedPort : 59077
Count : 999

Thanks in advance.

Dushan

So you want the top n OriginIP:ImpactedIP tuples basically? E.g. the top list of src:dest pairs ordered by count?

The easiest way would be a terms aggregation with scripting. Untested and in JSON, but should give you an idea what I mean:

GET /_search
{
   "size": 0,
    "aggs" : {
        "top_src_dest" : {
            "terms" : {
                "size": 100,
                "script" : {
                    "source": "doc['OriginIP'].value + '_' + doc['ImpactedIP'].value",
                    "lang": "painless"
                }
            }
        }
    }
}

That will give you the top 100 src:dest tuples and the document count, ordered by count descending. It's a bucketing agg, so you could add other aggregations under it like average latency, another terms agg to get the top impacted ports, etc.

If you want all the src:dest tuples and not just the top-n, you can use a Composite Agg with a scripted terms agg to get the results. Composite aggs allow you to "page" over the buckets (like a scroll request does for search hits) which is a lot more memory friendly.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.