I created a "Categorization" based anomaly detection job and explored the job results in Kibana.
I have used "Payload"(i.e String) field for categorization.
Here I'm not sure what does "typical" value in anomaly explorer results signifies?
P.S I knew for any numerical feature, "typical" value signifies the median of those values. But not sure in case of string
With ML categorization jobs you still do an anomaly detection as well as a categorization. Usually this would use a function of the category ID. It's almost always rare by mlcategory or count by mlcategory, and since you don't know which you've got it must be one of these that's been added by the categorization wizard. You can find out by looking at the job configuration in the ML jobs list.
If it's count by mlcategory then your typical and actual will be how many categories of Payload typically and actually occur per time bucket. If it's rare by mlcategory then typical will be the probability of seeing that category in a typical bucket.
You can see the category definitions without the anomaly information using the Get Categories API.
I used count by mlcategory . However the actual field gives out the count of that payload category in the datafeed, which I felt totally different from what you said in the last post.
In addition to that typical field is in float type.
What that information is saying is that for category 8 there were 1330 documents on 26th December 2021 that were classified as category 8. On average there are 221.6 documents per bucket in category 8. The reason typical is a float is because the expected value of a distribution of integers isn't always an integer. For example, the expected value from rolling a standard 6-sided dice is 3.5, but you'll never roll 3.5 on a single roll of the dice.
In case of population analysis with bucket span of 1 hour, it would be very helpful if you can explain about the typical value here.
In the screenshot attached, typical value remains the same for all anomalous messages. Why is it so?
For a population job anomalies are created for entities that are significantly different to the population within the current time bucket. So the average count for the population in that time bucket is 2.27, and those two entities have much higher counts.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.