We are using machine learning to detect anomalies in the request rates on our API. One of our jobs analyses the event rate per IP for a specific service.
It generally works fine, but has been raising false positives for a specific customer. That customer recently shifted his API usage by 30 minutes. eg. instead of calling our api at 08:00AM, they now call it at 08:30AM. Somehow the ML job hasn't adapted to that yet, even though it now constitutes the majority of the dataset.
On the screenshot above, you can see three sections:
- the first part has usage spikes at 8AM, 12PM, 4PM etc (every 4 hours). No anomaly reported. That's good
- The second part, where the anomalies start, is where the customer started using our api at 8:30AM instead of 8AM.
- The third part, forecast, still expects the spikes at 8AM instead of 8:30AM.
That job is otherwise working fine for all other customers. How can I fix it for this one?
And, more generally, how can I tell the ML job "yes, this is an anomaly" or "no, this is not an anomaly"? Finally, is there a way I could add notes on the anomaly timeline?