Hello everybody. I have faced, as it seems to me, with strange behavior of anomaly detection jobs.
The last bucket is not processed when the model_plot_config
is enabled
.
For example, when I use a bucket size "1d", then I always get the analysis only the day before yesterday.
If I just disable model_plot_config
, and the work starts working correctly.
In Job Management table, in both cases, I see the correct number of processed records, but this is not shown correctly in the chart (in case if model_plot is enabled).
Also, it doesn't matter what mode I run the datafeed (to end time or Real-time search).
I found this problem on the version 7.10.0. Also after upgrading to 7.12.0 nothing has changed.
I have already tried different configurations with frequency, bucket_span, query_delay... but the behavior is always the same.
Below is a very simplified example of a job and data to reproduce:
Job config:
{
"job_id": "events_job",
"description": "Events Job",
"analysis_config": {
"bucket_span": "1d",
"detectors": [
{
"function": "sum",
"field_name": "Events",
"detector_description": "Events Sum"
}
],
"influencers": []
},
"analysis_limits": {
"model_memory_limit": "11MB"
},
"data_description": {
"time_field": "Date",
"time_format": "epoch_ms"
},
"model_plot_config": {
"enabled": true,
"annotations_enabled": true
},
"model_snapshot_retention_days": 10,
"daily_model_snapshot_retention_after_days": 1,
"results_index_name": "custom-events_job",
"allow_lazy_open": true
}
Datafeed config:
{
"query_delay": "1h",
"query": {
"match_all": {}
},
"frequency": "60m",
"indices": [
"events"
],
"scroll_size": 1000,
"delayed_data_check_config": {
"enabled": true
},
"job_id": "events_job",
"datafeed_id": "datafeed-events_job"
}
Example source data for index:
[
{
"Date": "2021-03-16",
"Events": 5
},
{
"Date": "2021-03-17",
"Events": 10
},
{
"Date": "2021-03-18",
"Events": 5
},
{
"Date": "2021-03-19",
"Events": 10
},
{
"Date": "2021-03-20",
"Events": 5
}
]
Result with enabled model_plot (with 4 processed buckets):
Result with disabled model_plot (with 5 processed buckets)::
I have been looking for similar cases in the community for a long time, but nobody seems to have come across this.
I would be glad for any help or idea. Have a nice day!