I am trying to write a python script in order to find anomalies and relay them forward into our monitoring system.
What I am looking for is to get all the same information I can find from anomaly explorer (in the picture):
But after a couple of days of trying, I just cannot get it right.
What I have done, is that I have a single ML job which uses "customer"-field for partitioning the data. The function is "high_count by keywords over username partitionfield=customer".
If I have understood correctly, I should first search buckets, which has anomaly_score greater of equal to 75 (critical), which would give me a timeframe when at least one anomaly happened.
Then I would query all records and influencers from that timeframe, and I would get anomalies to be sent forward.
But my problem is that I don't know how to partition the data properly, as the bucket doesn't seem to have the information which customers data caused the anomaly. If I have understood correctly, it the bucket only tells the timeframe. So if I would query the influencers from that timeframe, I would also get other customers influencers and the data would get mixed. It would be trivial if every customer has their own ML job with separate indices, but I would like to have a single job for this thing.
Somehow the anomaly explorer gets it right. Can someone explain to me how it is done there?