Dec 4th, 2018: [EN][ML] Rarity Analysis with Machine Learning

richcollier · December 4, 2018, 12:00am

Often it is easy to forget about one key capability of our ML - Rarity Analysis

Finding items that rarely occur is often very useful. Some example use cases are finding:

Rarely occurring log messages
Rare running processes for a server
Rare connection destinations

ML's rare function is only available in the Advanced Job Wizard, but the configuration is relatively simple. For example, if you have data in the form of:

Then the ML job configuration could simply be:

In other words, "find rare ProgramNames for every host individually". Since every host will be treated uniquely, this means that a certain process that might be routine on one server could be deemed rare on another if it doesn't appear often.

Once analyzed, we could find a situation like this:

Where a the "ftp" process is witnessed on host=files05-dc1.dc1, which is rare for that server. This is perfect for Security Analytics style use cases where one is looking for nefarious behaviors invoked by malicious insiders or malware.

Keep in mind that the rare function is relative - in other words, it takes into account the frequency of other values of the field. So, for example, in the case of the list:

A,B,A,B,A,C,B,A,B,A,C,A,B,C,A,C,B,A,C,B,A,C,X

Here, X is obviously rare. But if the list were:

A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X

X is not obviously rare because everything is rare (and thus nothing is rare).

In v6.5, we introduced a new UI component to assist with the visualization of things that are rare in the Anomaly Explorer. Here’s an example of what that looks like;

The way to interpret this is that the blue dots in the bottom half of the UI show occurrence rates of field values over time (which is the horizontal dimension, of course). Those that wind up near the bottom are the rarest ones and the selected anomaly (in this case printdialog.exe) will be shown as an enlarged dot in the bottom half of the view (here colored yellow because of its score).

Happy Detecting!

Topic		Replies	Views
ML Unsupervised question SIEM	3	411	February 6, 2023
Machine Learning: Rare function not working as expected Elasticsearch elastic-stack-machine-learning	4	1013	August 7, 2017
Category examples not available in Machine Learning module 5.5 Elasticsearch elastic-stack-machine-learning	4	958	September 27, 2017
Machine Learning SIEM elastic-stack-machine-learning	3	471	November 4, 2021
Help with MachineLearning job Kibana elastic-stack-machine-learning	6	411	December 22, 2018

Dec 4th, 2018: [EN][ML] Rarity Analysis with Machine Learning

A,B,A,B,A,C,B,A,B,A,C,A,B,C,A,C,B,A,C,B,A,C,X

A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X

Related topics