I created a categorization ml job, and I get this message a lot:
Categorization status changed to 'warn' for 'client_event_key' '****' after 1186 buckets
I quote the following from the documentation
If the categorization status for a partition changes to
warn , it doesn’t categorize well and can cause a lot of unnecessary resource usage.
why it doesn’t categorize well ? and what should I do ?
Categorization only works well on machine generated text where certain entries contain constant terms around which to cluster.
client_event_key not knowing how the values look, I can only assume that this text field contains mostly random values.
It is possible that the field SHOULD categorize well, but should first be passed through a custom analyzer. https://www.elastic.co/guide/en/machine-learning/7.9/ml-configuring-categories.html#ml-configuring-analyzer
Another thing you can do is look at the model size stats for the job, which are on the "Counts" tab if you expand the row for the job in the jobs list.
This will include stats like how many documents got categorized and how many categories there were. It might make it clearer why the status changed to
warn. For example, if 10000 messages were categorized and created 9000 different categories then categorization would be pretty useless on that data set. Similarly if there was only 1 category detected.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.