Error for scripted_metric aggregation on auto_date_histogram aggregation

I am working with the sample log dataset. Here is what it looks like:

"hits" : [
      {
        "_index" : "kibana_sample_data_logs",
        "_type" : "_doc",
        "_id" : "nfvc3X4BKbUZuD5M4wWJ",
        "_score" : 1.0,
        "_source" : {
          "agent" : "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)",
          "bytes" : 4155,
          "clientip" : "57.65.101.133",
          "host" : "artifacts.elastic.co",
          "index" : "kibana_sample_data_logs",
          "ip" : "57.65.101.133", 
          "message" : "57.65.101.133 - - [2018-07-25T14:13:30.450Z] \"GET /beats/filebeat/filebeat-6.3.2-linux-x86_64.tar.gz HTTP/1.1\" 200 4155 \"-\" \"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)\"",
          "phpmemory" : null,
          "timestamp" : "2022-02-02T14:13:30.450Z",
          "url" : "https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-6.3.2-linux-x86_64.tar.gz",
          "utc_time" : "2022-02-02T14:13:30.450Z",
          "event" : {
            "dataset" : "sample_web_logs"
          }
        }
      },

I want to find out the maximum number of bytes (max_bytes) transferred for documents bucketed by a date histogram. In other words for all document that falls into a bucket created by the date histogram, I want to find the max bytes using the map/combine/reduce scripted_metric aggregation.

Here is what I tried:

GET kibana_sample_data_logs/_search?size=0
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "doc_buckets_for_date_histogram": {
      "auto_date_histogram": {
        "field": "timestamp", 
        "buckets": 10
      }, 
      "aggs": {
        "max_bytes": {
          "scripted_metric": {
            "init_script": "state.max_bytes = 0L;", 
            "map_script": """ 
              def current_bytes = doc['bytes'].getValue();
              if (current_bytes > state.max_bytes)
                {state.max_bytes = current_bytes;}
            """,
            "combine_script": "return state", 
            "reduce_script": """ 
              def max_bytes = 0L;
              for (s in states) {if (s.max_bytes > (max_bytes))
                {max_bytes = s.max_bytes;}}
              return max_bytes
            """
          }
        }   
      }
    }
  }
}

But it is giving an error:

{
  "error" : {
    "root_cause" : [ ],
    "type" : "search_phase_execution_exception",
    "reason" : "",
    "phase" : "fetch",
    "grouped" : true,
    "failed_shards" : [ ],
    "caused_by" : {
      "type" : "script_exception",
      "reason" : "runtime error",
      "script_stack" : [
        "if (s.max_bytes > (max_bytes))\n                {",
        "     ^---- HERE"
      ],
      "script" : "  ...",
      "lang" : "painless",
      "position" : {
        "offset" : 74,
        "start" : 69,
        "end" : 117
      },
      "caused_by" : {
        "type" : "illegal_argument_exception",
        "reason" : "dynamic getter [java.lang.Long, max_bytes] not found"
      }
    }
  },
  "status" : 400
}

It seems to me that the reduce script is not getting the s.max_bytes argument. I dont know why not.
Please help.

I'm not sure why, it worked for terms bucket aggregation.
But It failed for histogram aggregation as for date histogram aggregation.

Something is wrong when s, which is supposed to be a HashMap (or possibly null), has become Long.
Could this be a bug of scripted_metric aggregation on both histogram aggregations??

GET kibana_sample_data_logs/_search?size=0
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "doc_buckets_for_date_histogram": {
      "terms": {
        "field": "agent.keyword"
      },
      "aggs": {
        "max_bytes": {
          "scripted_metric": {
            "init_script": "state.max_bytes = 0L;", 
            "map_script": """ 
              def current_bytes = doc['bytes'].getValue();
              if (current_bytes > state.max_bytes)
                {state.max_bytes = current_bytes;}
            """,
            "combine_script": "return state", 
            "reduce_script": """ 
              def max_bytes = 0L;
              for (s in states) {if (s.max_bytes > (max_bytes))
                {max_bytes = s.max_bytes;}}
              return max_bytes
            """
          }
        }   
      }
    }
  }
}

I found a similar post of several years ago left unsolved...

When I check null bucket (insert if (Objects.isNull(s)){max_bytes=max_bytes} else), histogram and datehistogram aggregation become works well.
But your auto_date_histogram doesn't work and still raise same error. It could be something specific to scripted_metric aggregation on auto_date_histogram.

scripted_metric aggregation with null check on date_histogram aggregation works well:

GET kibana_sample_data_logs/_search?size=0
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "doc_buckets_for_date_histogram": {
      "date_histogram": {
        "field": "timestamp",
        "interval": "month"
      }, 
      "aggs": {
        "max_bytes": {
          "scripted_metric": {
            "init_script": "state.max_bytes = 0L;", 
            "map_script": """ 
              def current_bytes = doc['bytes'].getValue();
              if (current_bytes > state.max_bytes)
                {state.max_bytes = current_bytes;}
            """,
            "combine_script": "return state", 
            "reduce_script": """ 
              def max_bytes = 0L;
              for (s in states) {if (Objects.isNull(s)){max_bytes=max_bytes} else if (s.max_bytes > (max_bytes))
                {max_bytes = s.max_bytes;}}
              return max_bytes
            """
          }
        }   
      }
    }
  }
}
1 Like

Thank you ever so much for your effort in running the code in your own environment before responding to my message. I appreciate it a lot. I got the same result on running your code. It works with date_histogram but not auto_date_histogram. Thank you again.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.