So I just began with machine learning jobs and I wanna create a job to detect port scans.
I wanna aggregate data by
source.ip and then by
destination.ip and finally count the number of
Could you tell me how can I make an aggregation in machine learning jobs !
Well, to answer your question, information about how to use an elasticsearch query aggregation as part of your ML job can be found here: https://www.elastic.co/guide/en/machine-learning/7.9/ml-configuring-aggregation.html
However, you likely have a very high cardinality of IP addresses. May I suggest that you instead use Population Analysis and configure something like the following:
detector: distinct_count(destination.port) over destination.ip
influencers: destination.ip, source.ip
The population analysis will effectively ease the burden on the high-cardinality destination IP field and the source IP as an influencer will only get analyzed if there's an anomaly on the distinct count, as defined by the detector.
Thanks a lot for your reply and for your advices
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.