I have created a Categorization job in kibana. As input, I have given the index field which contains the log messages.
After the job gets completed processing, I can see the Analysis results at the bottom in tabular format:
Within it when I select Severity as "warning", and Interval as "Show all". I am expecting it to show me all the categories. But it is showing only some.
For example: when I run this API: _ml/anomaly_detectors/tdw_job_cat/results/categories
It gives total categories count as 15. That means there are 15 ml categories detected.
But within Kibana, it shows only these ml categories: 1,2,3,4,6,7,14
You can see it is missing ml categories: 5,8,9,10,11,12,13,15
I know the docs say the default size is 100 on that API, but the fact that you got 10 results makes me wonder if there’s a bug and the default size for get categories is actually 10. (10 is the default for the search API.)
Try adding ?size=1000 to your request. Does it return all 15 categories than? If it does then we’ll need to investigate whether it’s the code or the docs that are wrong for the default size.
I have mentioned above when I run the api: _ml/anomaly_detectors/tdw_job_cat/results/categories
where tdw_job_cat is the job id.
I am getting a json response, where the count field has value 15. Also in the main body I am getting 15 different ml category. So no problem here!!
I am expecting the same thing to happen in Kibana Anomaly explorer UI, where it should show me all the 15 different ml category. When I select Severity as "warning", and Interval as "Show all". But it shows me multiple occurrences of these ml category: 1,2,3,4,6,7,14
But it is not showing the ml category: 5,8,9,10,11,12,13,15
In the Kibana Anomaly Explorer UI (where you see the severity scores) - it is showing you the ANOMALOUS categories - not every category like the API gives you.
In other words, categories 5,8,9,10,11,12,13,15 do not have any anomalous behavior to them, therefore they do not show on the Anomaly Explorer UI.
I have created this job using "count" function. So what I understand non-anomalous ml category would be those categories where the actual and typical score would be same or nearly same? Am I right?
If yes how much nearness between actual and typical score is taken into consideration for a ml category to be non-anomalous?
If that's not the case.
How is it determined whether the ml category is showing anomalous behaviour or not? Which parameter is taken into consideration? Is that parameter returned by an API so that I can check?
So what I understand non-anomalous ml category would be those categories where the actual and typical score would be same or nearly same?
Essentially yes. After Elastic ML breaks the log messages into the different categories, the count (in your case) of each of those categories is tallied over time (in increments of bucket_span) and is modeled as a probability distribution (to express the likelihood of the rate of those log messages per unit time).
For example, here are 3 different probability distributions (let's pretend that each distribution models the probability of 3 different ml_category values, denoted λ as in this case:
On the x-axis is the count of messages per unit time. On the y-axis, is the probability that this count occurs (learned from past observations). So, for example:
For λ = 1, there is about a 37% chance that zero or one occurrence happens per unit of time.
For λ = 4, there is about a 20% chance that three or four occurrences happen per unit of time.
For λ = 10, there is about a 13% chance that 10 occurrences happen per unit of time.
The discrete points on each curve also give the likelihood (probability) of other values
of occurrence. As such, the model can be informative and answer questions such as "Is getting
15 counts of this message likely?" As we can see, it is not likely for λ = 1 or λ = 4, but it is somewhat likely for λ = 10.
The anomaly scoring is based upon this probability value - the more unlikely, the higher the anomaly score. See this blog for more information on how scoring works.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.