I created a machine learning job which uses two metrics as detectors defined as aggregations of fields in my index and everything was working great (Job_0, Datafeed_0 code below).
What I'd really like to do though is define each of those metrics based on a filter unique to each metric. However if I try to filter those fields, I seem to lose the ability to refer back to the results as field names in the job itself.
How can I create two metrics, filtered based on different criteria, and then still reference them as detectors in the ML job?
Job_0: No errors regarding referencing metric1 and metric2 as detectors:
{
"description": "My Job",
"analysis_config": {
"bucket_span": "1h",
"detectors": [
{
"detector_description": "Metric 1",
"function": "mean",
"field_name": "metric1"
},
{
"detector_description": "Metric 2",
"function": "mean",
"field_name": "metric2"
}
],
"summary_count_field_name": "doc_count"
},
"model_plot_config": {"enabled": "true"},
"data_description": {"time_field": "ts"}
}
Datafeed_0:
{
"job_id": "my_job_id",
"indices": ["my_index"],
"aggs": {
"buckets": {
"date_histogram": {
"field": "ts",
"interval": "1h",
"time_zone": "UTC"
},
"aggs": {
"ts": {"max": {"field": "ts"}},
"metric1":{"value_count":{"field":"my_field"}},
"metric2":{"sum":{"field":"my_other_field"}}
}
}
}
}
Result:
{
"metric1_value": 3509,
"metric2_value": 58613,
"ts": 1544425187000,
"doc_count": 3509
}
Even though I can run this query, and it returns the data I want, I can't seem to insert it into the datafeed:
{
"aggs": {
"ts": {"max": {"field": "ts"}},
"metric1": {
"filter":{"term": {"my_filter_field": "A" } },
"aggs" : {
"metric1_value" : { "value_count" : { "field" : "my_field" } }
}
},
"metric2": {
"filter":{"term": {"my_filter_field": "B" } },
"aggs" : {
"metric2_value" : { "value_count" : { "field" : "my_other_field" } }
}
}
}
}
When I try that in the Datafeed_1 on Job_0 I get:
{
"type": "illegal_argument_exception",
"reason": "Unsupported aggregation type [metric1]"
}
Datafeed_1:
{
"job_id": "my_job_id",
"indices": ["my_index"],
"aggs": {
"buckets": {
"date_histogram": {
"field": "ts",
"interval": "1h",
"time_zone": "UTC"
},
"aggs": {
"ts": {"max": {"field": "ts"}},
"metric1":{
"filter":{"term":{"my_filter_field":"A"}},
"aggs":{
"metric1_value":{"value_count":{"field":"my_field"}}
}
},
"metric2":{
"filter":{"term":{"my_filter_field":"B"}},
"aggs":{
"metric2_value":{"sum":{"field":"my_other_field"}}
}
}
}
}
}
}
If I try to reference the field name inside the aggregation ("metric1_value") instead, I don't get any error, and the job runs, but the metric value is nowhere to be found in the results:
{
"ts": 1544425187000,
"doc_count": 3509
}
How can I filter two separate metrics in different ways, but then access the inner level of the result document that actually has the values and pass that to the detector field? If the metrics are like "ts" below, I can just reference "ts" and it works. How do I reference "metric1_value"?
"aggregations": {
"metric1": {
"doc_count": 0,
"metric1_value": {
"value": 0
}
},
"metric2": {
"doc_count": 0,
"metric2_value": {
"value": 0
}
},
"ts": {
"value": 1544658335000,
"value_as_string": "1544658335"
}
}