I was wondering if anyone knows if/how to use random sampling with machine learning anomaly detection?
I would like to monitor data rates sent by various agents to various indices/data streams and would like to know when the data rate is abnormally low. While I know I can use the low_count function for this. My concern is the high cardinality I'm looking to monitor against. I was thinking that rather than using a count against all documents, I'd use random sampling to only get a sub-count for each category. My thinking is that whether I look at 10% or 100% of the documents, the relative "rate" of documents would be the same, so looking at them all isn't 100% necessary.