Our SIEM collects logs from many firewall and stores in a Elasticsearch 5.6 repository. We want to retrieve data for Source and destination IP communication on certain ports. The idea is to find the top communicators and try to find out how many times each pair communicates on the list of ports. Following is my data record and for 24 hours there are about 8 million records.
SourceIP = OriginIP
DestinationIP = ImpactedIP
Port = ImpactedPort
msgSourceTypeName : Syslog - Cisco ASA
impactedPort : 59077
commonEventName : Translation Teardown
normalDate : 2019-06-16T21:57:41.757Z
impactedIp : 88.77.63.99
logSourceName : 192.168.135.21 Cisco ASA
directionName : Outbound
originIp : 192.168.100.24
msgSourceTypeName : Syslog - Cisco ASA
impactedPort : 80
commonEventName : Traffic Denied by Network Firewall
normalDate : 2019-06-16T21:57:42.783Z
impactedIp : 65.44.123.214
logSourceName : 192.168.135.92 Cisco ASA
directionName : Outbound
originIp : 10.162.31.166
msgSourceTypeName : Syslog - Cisco ASA
impactedPort : 443
commonEventName : Connection Teardown
normalDate : 2019-06-16T21:57:45.886Z
impactedIp : 212.234.123.232
logSourceName : 192.168.135.21 Cisco ASA
directionName : Outbound
originIp : 192.168.100.24
I'm using Java REST API to connect to my Elasticsearch and retrieve data and manages to use scroll API to go through all 8 million records for 24 hours but it takes hours to display. In this scenario it's not viable for me to go through the entire search response and do the aggregation by the Java program.
Is there anyway I can make my query to aggregate the results the way I want, that is
OriginIP : xx.xx.xx.xx
ImpactedIP: yy.yy.yy.yy
ImpactedPort : 443
Count : 999
impactedPort : 80
Count : 999
ImpactedPort : 59077
Count : 999
Thanks in advance.
Dushan