I've also tried adding my field that I am doing the distinct_count over as an influencer to my ml job however, it appears to not include all of the population of reply_CustomerId that the distinct count is performed on. Just a small handful or none.
In this case, ML does not retain the values of the field_name (reply_CustomerId) - so they are not stored within the .ml-anomalies-* index.
If you truly wanted them, you'd need to have your Watch use an "input chain" where the 1st input is a query to determine the anomaly for the field request_IPAddress - then use that request_IPAddress in a subsequent query to the raw data index (passing that request_IPAddress value and most likely also the timestamp of the bucket that the anomaly occurs in).
Then, you could have a list of the reply_CustomerIds that made the distinct count anomalous.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.