Intermittent transient anomaly at data's edge

We just starting to use ML and we are having trouble with it. With a real-time datafeed running it (intermittently) detects anomaly right at the edge of new data. The anomaly is either "unexpected 0" or value that is super low. Few minutes later the anomaly clears up and disappears from anomaly explorer. However, watcher that is setup for this ML job already fired and send out false email.
I verified that ingested data is never more than 1 min behind. In fact, I setup watcher alert to notify if number of ingested documents in this particular index is below threshold. That alert never fired.
I keep pushing "query_delay" parameter to where it is now at 25m and it doesn't seem to help.
Here is the ML job configuration:
{
"count": 1,
"jobs": [
{
"job_id": "convex-programmatic-bids",
"job_type": "anomaly_detector",
"job_version": "6.5.1",
"groups": [
"convex"
],
"description": "Convex received bids from partners",
"create_time": 1548442241884,
"established_model_memory": 97060,
"analysis_config": {
"bucket_span": "15m",
"detectors": [
{
"detector_description": "low_sum(convex.programmatic.impressions_value)",
"function": "low_sum",
"field_name": "convex.programmatic.impressions_value",
"partition_field_name": "partner",
"detector_index": 0
}
],
"influencers": [
"partner",
"dc.keyword"
]
},
"analysis_limits": {
"model_memory_limit": "36mb",
"categorization_examples_limit": 4
},
"data_description": {
"time_field": "@timestamp",
"time_format": "epoch_ms"
},
"model_snapshot_retention_days": 1,
"custom_settings": {
"custom_urls":
},
"model_snapshot_id": "1548459684",
"results_index_name": "custom-convex-programmatic-bids"
}
]
}

And datafeed config:
{
"count": 1,
"datafeeds": [
{
"datafeed_id": "datafeed-convex-programmatic-bids",
"job_id": "convex-programmatic-bids",
"query_delay": "25m",
"frequency": "1m",
"indices": [
"logstash_convex-*"
],
"types": ,
"query": {
"bool": {
"filter": [
{
"exists": {
"field": "convex.programmatic.impressions_value",
"boost": 1
}
},
{
"match": {
"status.keyword": {
"query": "BID",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
}
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
"scroll_size": 1000,
"chunking_config": {
"mode": "auto"
}
}
]
}

Seems similar to: ML alerts triggering on interim result

See that thread for the workaround and the related bug report at: https://github.com/elastic/ml-cpp/issues/324

Thanks @richcollier, it does indeed seem similar. I didn't see workaround in the other thread, I assume its to add filter to watcher query to ignore interim results?

Yes - that is the workaround!

Verified that was indeed the issue and solved by filtering interim results in the watcher.

Thank you for the help. I'll be monitoring bug also.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.