Problems using Sum Aggregations for ML jobs

Hey,
I have been trying to use a detector that calculates the high sum of a given field, and I have the following questions:

  1. Why can't I see the single metric viewer for this job?
    The job is configured in the "advanced job" mode, and although I can see anomalies in the Anomaly Explorer, I cannot see the single metric viewer for any detectors with a sum/high-sum function.

  2. How is it possible that the model is calculating the "high_sum" of a STRING field?
    The field I am using is "bytes_to_server", and the JSON decoder is decoding it as a string instead of as an integer. While I cannot apply aggregations on this field in Kibana visualizations (as I would expect), I am able to create an ML job that finds the sum of the same field. This is just something I can't wrap my head around :sweat_smile:

(Just FYI- I've been making and viewing all my Machine Learning jobs in the Kibana UI)

Thanks!

@BenTrent will get to this question .

Cheers
Rashmi

What version of the stack are you running?

Here is a job I created in the advanced job config that has multiple detectors. It also has a partition around a keyword field, but I created another one without the partition and I could see it in Single Metric Viewer. This is in 7.1.0, but this type of visualization has been supported for a while now.

As for how we are summing a "string" field: What is the mapping of the field "bytes_to_server" in the index? If the mapped type is a numeric (i.e. long, double etc.) we will end up using doc_values when we gather the data before sending it to the Machine Learning job for processing.

Example of using doc values to get the appropriate type in the JSON payload from the search.

PUT test_string
{
  "mappings": {
    "properties": {
      "long_string" :{
        "type": "long"
      }
    }
  } 
}

POST test_string/_doc
{
  "long_string": "12345"
}

GET test_string/_search
{
  "docvalue_fields": [
    {"field": "long_string"}]
}


>{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test_string",
        "_type" : "_doc",
        "_id" : "a7kXn2sBb78BhX1qpSZl",
        "_score" : 1.0,
        "_source" : {
          "long_string" : "12345"
        },
        "fields" : {
          "long_string" : [
            12345
          ]
        }
      }
    ]
  }
}

Hi Ben,
Thanks for your quick reply.

We are on version 6.7.1 of elasticsearch

I see it mapped as "keyword":
image

Also, from what I understand from the documentation (https://www.elastic.co/guide/en/elasticsearch//reference/current/doc-values.html), doc_values are used to just column-store our data, right? It wouldn't change our mappings? If so, can keywords be aggregated?

Thanks again :slight_smile:

@Aashka, you found a wonderful bug in our visualization :smiley:.

The DataFeed does NOT have to contain aggregations. I am assuming in your case, it is not using aggregations and simply scrolling through the documents. You can confirm this by looking at the data feed config definition and verifying that there are no aggregations referencing the keyword value. Once the datafeed sends the data back to the C++ process, it is casting the string into a numeric value.

The reason Single metric is not working is that the UI is attempting to aggregate the keyword, and is expecting a numeric value. I attempted this myself and found the following error in my console output.

Time series explorer - error getting metric data from elasticsearch: 
Object { statusCode: 400, error: "Bad Request", message: "[illegal_argument_exception] Expected numeric type on field [response.keyword], but got [keyword]" }

I will reach out to the ML UI team to see if we can get this error to bubble up so that users see this occur, or how we can better handle unexpected values when trying to view things through single metric viewer.

1 Like

@Aashka as Ben points out, the Single Metric Viewer is expecting the partitioning field to be a numeric type, and currently fails, without an obvious error (except in the browser console), by attempting to aggregate on a keyword field. There is currently an open issue for this.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.