The machine learning advance job of detecting anomalies in the network data is trained with two months of data. But when I test the model with real-time live data I can see the spike on the graph but no anomalies are detected, where as the anomaly points can be seen on trained data.
The most probable explanation is that the datafeed advances that time interval before the data is available for search. To verify this use the get-buckets API for the interval where you have the big spike and check the event_count. Could you post the outcome of this?
Machine Learning results have buckets at the top level. A bucket is written out for each interval that equals the configured bucket_span. Buckets are written out regardless of whether there were any anomalies in them.
Based on your screenshot above, the number of buckets should definitely be non-zero.
If you try the request without any additional parameters, e.g.:
GET _xpack/ml/anomaly_detectors/{job_id}/results/buckets
you should get a count that is non-zero.
For debugging the issue you experience, you could use
GET _xpack/ml/anomaly_detectors/{job_id}/results/buckets?start={time_before_big_spike}
where {time_before_big_spike} should be in format, say: "2018-09-20T00:00:00Z".
If that returns you 0 buckets, then it suggests that the datafeed did not actually run up to that point in time.
For completeness, it would help if you also posted the job configuration (paste the JSON you see in the JSON tab for the job in question) and the job messages (as they help understand what was the operational timeline).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.