I created a categorization ml job, and I get this message a lot:
Categorization status changed to 'warn' for 'client_event_key' '****' after 1186 buckets
I quote the following from the documentation
If the categorization status for a partition changes to warn , it doesn’t categorize well and can cause a lot of unnecessary resource usage.
why it doesn’t categorize well ? and what should I do ?
Another thing you can do is look at the model size stats for the job, which are on the "Counts" tab if you expand the row for the job in the jobs list.
This will include stats like how many documents got categorized and how many categories there were. It might make it clearer why the status changed to warn. For example, if 10000 messages were categorized and created 9000 different categories then categorization would be pretty useless on that data set. Similarly if there was only 1 category detected.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.