Create a machine learning job with aggregation

Hello everybody,

So I just began with machine learning jobs and I wanna create a job to detect port scans.
I wanna aggregate data by source.ip and then by destination.ip and finally count the number of destination.port
Could you tell me how can I make an aggregation in machine learning jobs !


Well, to answer your question, information about how to use an elasticsearch query aggregation as part of your ML job can be found here:

However, you likely have a very high cardinality of IP addresses. May I suggest that you instead use Population Analysis and configure something like the following:

detector: distinct_count(destination.port) over destination.ip
influencers: destination.ip, source.ip

The population analysis will effectively ease the burden on the high-cardinality destination IP field and the source IP as an influencer will only get analyzed if there's an anomaly on the distinct count, as defined by the detector.

1 Like

Thanks a lot for your reply and for your advices

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.