Anomaly Detection Categorization: Kibana not showing all ml category

aviral_srivastava · April 22, 2022, 9:19am

Hi,

I am using
Elasticsearch 8.1.0
Kibana 8.1.0

I have created a Categorization job in kibana. As input, I have given the index field which contains the log messages.

After the job gets completed processing, I can see the Analysis results at the bottom in tabular format:
Within it when I select Severity as "warning", and Interval as "Show all". I am expecting it to show me all the categories. But it is showing only some.

For example: when I run this API: _ml/anomaly_detectors/tdw_job_cat/results/categories
It gives total categories count as 15. That means there are 15 ml categories detected.
But within Kibana, it shows only these ml categories: 1,2,3,4,6,7,14

You can see it is missing ml categories: 5,8,9,10,11,12,13,15

Why is this happening?

droberts195 · April 22, 2022, 9:52am

I know the docs say the default size is 100 on that API, but the fact that you got 10 results makes me wonder if there’s a bug and the default size for get categories is actually 10. (10 is the default for the search API.)

Try adding ?size=1000 to your request. Does it return all 15 categories than? If it does then we’ll need to investigate whether it’s the code or the docs that are wrong for the default size.

droberts195 · April 22, 2022, 9:53am

Oh, you edited your post since I first read it and there are now fewer than 10 categories returned, so my idea isn’t relevant any more.

aviral_srivastava · April 22, 2022, 10:03am

Hi,

I have mentioned above when I run the api: _ml/anomaly_detectors/tdw_job_cat/results/categories
where tdw_job_cat is the job id.
I am getting a json response, where the count field has value 15. Also in the main body I am getting 15 different ml category. So no problem here!!

I am expecting the same thing to happen in Kibana Anomaly explorer UI, where it should show me all the 15 different ml category. When I select Severity as "warning", and Interval as "Show all". But it shows me multiple occurrences of these ml category: 1,2,3,4,6,7,14
But it is not showing the ml category: 5,8,9,10,11,12,13,15

richcollier · April 22, 2022, 10:18am

In the Kibana Anomaly Explorer UI (where you see the severity scores) - it is showing you the ANOMALOUS categories - not every category like the API gives you.

In other words, categories 5,8,9,10,11,12,13,15 do not have any anomalous behavior to them, therefore they do not show on the Anomaly Explorer UI.

aviral_srivastava · April 22, 2022, 11:00am

Thanks for the response.

I have created this job using "count" function. So what I understand non-anomalous ml category would be those categories where the actual and typical score would be same or nearly same? Am I right?
If yes how much nearness between actual and typical score is taken into consideration for a ml category to be non-anomalous?

If that's not the case.

How is it determined whether the ml category is showing anomalous behaviour or not? Which parameter is taken into consideration? Is that parameter returned by an API so that I can check?

richcollier · April 22, 2022, 1:36pm

So what I understand non-anomalous ml category would be those categories where the actual and typical score would be same or nearly same?

Essentially yes. After Elastic ML breaks the log messages into the different categories, the count (in your case) of each of those categories is tallied over time (in increments of bucket_span) and is modeled as a probability distribution (to express the likelihood of the rate of those log messages per unit time).

For example, here are 3 different probability distributions (let's pretend that each distribution models the probability of 3 different ml_category values, denoted λ as in this case:

On the x-axis is the count of messages per unit time. On the y-axis, is the probability that this count occurs (learned from past observations). So, for example:

For λ = 1, there is about a 37% chance that zero or one occurrence happens per unit of time.
For λ = 4, there is about a 20% chance that three or four occurrences happen per unit of time.
For λ = 10, there is about a 13% chance that 10 occurrences happen per unit of time.

The discrete points on each curve also give the likelihood (probability) of other values
of occurrence. As such, the model can be informative and answer questions such as "Is getting
15 counts of this message likely?" As we can see, it is not likely for λ = 1 or λ = 4, but it is somewhat likely for λ = 10.

The anomaly scoring is based upon this probability value - the more unlikely, the higher the anomaly score. See this blog for more information on how scoring works.

system · May 20, 2022, 1:36pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Anomaly Detection Categorization: Kibana Severity vs Anomaly score Kibana elastic-stack-machine-learning	2	438	May 23, 2022
Anomaly Detection Categorization: Kibana Signs used for Severities(warning, minor, major, critical) Kibana elastic-stack-machine-learning	5	573	May 24, 2022
Anomaly detection in Machine learning kibana Kibana elastic-stack-machine-learning	7	516	December 23, 2021
ML doesn't track "array" type data using by_field_name individually Kibana elastic-stack-machine-learning	13	637	May 16, 2020
Kibana anomaly explorer Kibana elastic-stack-machine-learning	3	550	December 15, 2020

Anomaly Detection Categorization: Kibana not showing all ml category

Related topics