Hey guys,
I have a ML job that is doing a distinct count of a field over another e.g.
"detector_description": "distinct_count(reply_CustomerId) over request_IPAddress",
"function": "distinct_count",
"field_name": "reply_CustomerId",
"over_field_name": "request_IPAddress"
I was wondering if it is possible to somehow retrieve the field_name values? I intend to use them in a watcher action.
I've had a look through your result resource documentation and the closest I can see is in the field_name property.
I've also tried adding my field that I am doing the distinct_count over as an influencer to my ml job however, it appears to not include all of the population of reply_CustomerId that the distinct count is performed on. Just a small handful or none.
We are currently on Version: 6.5.4
Any clues?
Thank you!
In this case, ML does not retain the values of the field_name (reply_CustomerId) - so they are not stored within the .ml-anomalies-* index.
If you truly wanted them, you'd need to have your Watch use an "input chain" where the 1st input is a query to determine the anomaly for the field request_IPAddress - then use that request_IPAddress in a subsequent query to the raw data index (passing that request_IPAddress value and most likely also the timestamp of the bucket that the anomaly occurs in).
Then, you could have a list of the reply_CustomerIds that made the distinct count anomalous.
An example of a chain input watch can be seen here: https://github.com/elastic/examples/blob/master/Alerting/Sample%20Watches/ml_examples/bucket_record_chain_watch.json
But note that in the above example, the 2nd query also hits the .ml-anomalies-* index. In your case, you'd hit the raw data index.