Problems using Sum Aggregations for ML jobs

Aashka · June 28, 2019, 3:46pm

Hey,
I have been trying to use a detector that calculates the high sum of a given field, and I have the following questions:

Why can't I see the single metric viewer for this job?
The job is configured in the "advanced job" mode, and although I can see anomalies in the Anomaly Explorer, I cannot see the single metric viewer for any detectors with a sum/high-sum function.
How is it possible that the model is calculating the "high_sum" of a STRING field?
The field I am using is "bytes_to_server", and the JSON decoder is decoding it as a string instead of as an integer. While I cannot apply aggregations on this field in Kibana visualizations (as I would expect), I am able to create an ML job that finds the sum of the same field. This is just something I can't wrap my head around

(Just FYI- I've been making and viewing all my Machine Learning jobs in the Kibana UI)

Thanks!

rashmi · June 28, 2019, 4:05pm

@BenTrent will get to this question .

Cheers
Rashmi

BenTrent · June 28, 2019, 5:12pm

What version of the stack are you running?

Here is a job I created in the advanced job config that has multiple detectors. It also has a partition around a keyword field, but I created another one without the partition and I could see it in Single Metric Viewer. This is in 7.1.0, but this type of visualization has been supported for a while now.

As for how we are summing a "string" field: What is the mapping of the field "bytes_to_server" in the index? If the mapped type is a numeric (i.e. long, double etc.) we will end up using doc_values when we gather the data before sending it to the Machine Learning job for processing.

Example of using doc values to get the appropriate type in the JSON payload from the search.

PUT test_string
{
  "mappings": {
    "properties": {
      "long_string" :{
        "type": "long"
      }
    }
  } 
}

POST test_string/_doc
{
  "long_string": "12345"
}

GET test_string/_search
{
  "docvalue_fields": [
    {"field": "long_string"}]
}


>{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test_string",
        "_type" : "_doc",
        "_id" : "a7kXn2sBb78BhX1qpSZl",
        "_score" : 1.0,
        "_source" : {
          "long_string" : "12345"
        },
        "fields" : {
          "long_string" : [
            12345
          ]
        }
      }
    ]
  }
}

Aashka · June 28, 2019, 6:56pm

Hi Ben,
Thanks for your quick reply.

We are on version 6.7.1 of elasticsearch

I see it mapped as "keyword":

Also, from what I understand from the documentation (https://www.elastic.co/guide/en/elasticsearch//reference/current/doc-values.html), doc_values are used to just column-store our data, right? It wouldn't change our mappings? If so, can keywords be aggregated?

Thanks again

BenTrent · June 29, 2019, 2:50am

@Aashka, you found a wonderful bug in our visualization .

The DataFeed does NOT have to contain aggregations. I am assuming in your case, it is not using aggregations and simply scrolling through the documents. You can confirm this by looking at the data feed config definition and verifying that there are no aggregations referencing the keyword value. Once the datafeed sends the data back to the C++ process, it is casting the string into a numeric value.

The reason Single metric is not working is that the UI is attempting to aggregate the keyword, and is expecting a numeric value. I attempted this myself and found the following error in my console output.

Time series explorer - error getting metric data from elasticsearch: 
Object { statusCode: 400, error: "Bad Request", message: "[illegal_argument_exception] Expected numeric type on field [response.keyword], but got [keyword]" }

I will reach out to the ML UI team to see if we can get this error to bubble up so that users see this occur, or how we can better handle unexpected values when trying to view things through single metric viewer.

Peter_Harverson · July 1, 2019, 8:58am

@Aashka as Ben points out, the Single Metric Viewer is expecting the partitioning field to be a numeric type, and currently fails, without an obvious error (except in the browser console), by attempting to aggregate on a keyword field. There is currently an open issue for this.

system · July 29, 2019, 8:58am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to get the sum in time of values in Lens Kibana	52	2808	September 2, 2022
Kibana Sum Aggregation showing wrong result Kibana	3	1490	June 13, 2019
X-pack Single metric job Kibana elastic-stack-machine-learning	19	816	April 17, 2019
ML Kibana: problem with an advanced job using partitionfield Kibana elastic-stack-machine-learning	18	1139	September 3, 2019
Kibana Metrics Visualization : Sum Aggregation, applied to Unique (field based) objects Kibana	2	2786	July 6, 2017

Problems using Sum Aggregations for ML jobs

Related topics