Elasticsearch 5.3 scripted_metric reduce script running twice?

I have a creeping suspicion that there is something very wrong with me putting a scripted metric inside a filter aggregation inside a terms aggregation.

Here is an example of the type of query that's giving me weird behavior.

{
  "size": 0,
  "aggs": {
    "a": {
      "terms": {
        "field": "id",
        "size": 10
      },
      "aggs": {
        "measure" : {
          "filter" : {
            "bool" : {
              "must" : [
                {
                  "terms" : {
                    "foo.bar" : [
                      "foo"
                    ],
                    "boost" : 1.0
                  }
                },
                {
                  "terms" : {
                    "foo.baz" : [
                      "bar"
                    ],
                    "boost" : 1.0
                  }
                }
              ]
            }
          },
          "aggregations" : {
           "profit": {
      "scripted_metric": {
                "init_script" : "params._agg.transactions = []",
                "map_script" : "params._agg.transactions.add(1)", 
                "combine_script" : "double profit = 0; for (t in params._agg.transactions) { profit += t } return profit",
                "reduce_script" : "double profit = 0; for (a in params._aggs) { profit += a } return profit"
            }
    }
          }
        }
      }
    }
  }
}

At first glance nothing wrong here. Not super complicated its just the example scripted metric inside a filter aggregation inside a terms aggregation.

But the script will through a null pointer in the reduce script!

"caused_by": {
      "type": "script_exception",
      "reason": "runtime error",
      "script_stack": [
        "profit += a } ",
        "^---- HERE"
      ],
      "script": "double profit = 0; for (a in params._aggs) { profit += a } return profit",
      "lang": "painless",
      "caused_by": {
        "type": "null_pointer_exception",
        "reason": null
      }
    }

When debugging this it looks like the reduce script is running TWICE!

If I change the reduce_script to

"reduce_script" : "Debug.explain(params._aggs)" 

I get back

      "to_string": "[null]",
      "java_class": "java.util.ArrayList",
      "script_stack": [
        "Debug.explain(params._aggs); double ",
        "                    ^---- HERE"
      ]

Whats even more crazy is that if I change the reduce script to

"reduce_script" : "if (params._aggs[0] != null) {Debug.explain(params._aggs)}"

I get

"to_string": "[0.0]",
      "java_class": "java.util.ArrayList",
      "script_stack": [
        "Debug.explain(params._aggs)}",
        "                    ^---- HERE"
      ],
      "script": "if (params._aggs[0] != null) {Debug.explain(params._aggs)}",

What's going on here? Why do I get two different values? And why is the script returning a [null] in the first place?

@Ervin_Puskar Thanks for flagging this up, this is indeed very weird. I have managed to reproduce this using a simple script and it definitely looks like a bug so I have raised https://github.com/elastic/elasticsearch/issues/25020 to track it. Now that I have a reliable way to reproduce it I'll dig further and try to work out whats going on here. I'll post my findings on that linked issue.

@Ervin_Puskar after looking into this further I realised what's going on (I'll post this update on the bug I raised too):

Firstly, the reason the reduce script is run twice is that the because you have the scripted_metric aggregation as a sub-aggregation of the terms aggregation it is evaluated for each of the terms buckets so the reduce script needs to be run for each bucket in the terms aggregation. In my recreation script in the issue I raised the reduce script is run 5 times because the terms aggregation produces five buckets (one for each of my five id terms).

Now to why you get a NPE in your first reduce script:

 "double profit = 0; for (a in params._aggs) { profit += a } return profit"

The reason is because you have terms buckets where none of the documents that fall into your bucket match the filter aggregation. In my example in the issue the issue is seen because of the document:

PUT test/doc/10
{
  "id": "e",
  "foo.bar": "fooo",
  "foo.baz": "barr"
  
}

which is the only document that contains the term e in the id field and also doesn't match the filter aggregation. This causes an empty aggregation response to be created for the aggregation in that bucket when it is returned from the shards which your reduce script doesn't expect and deal with. If you replace your reduce script with the following it executes correctly:

"double profit = 0; for (a in params._aggs) { if (a != null) { profit += a } } return profit"

At the moment there is no mention of the empty aggregation response in the documentation for the scripted_metric aggregation so I think we should clarify it there.

For your second and third reduce scripts which use Debug.explain I think the reason it is not showing you that the value is null is that because the Debug.explain throws an exception to tell you what the value of the variable is, it is throwing the exception on the first bucket it is evaluating which does not have the empty aggregation.

I hope that helps clear this up, let me know if you have more questions

Thank you so much for looking into this. Your explanation makes sense and I feel silly that I missed something that simple.

I do have a follow up question:

Is it possible for a scripted metric not to return anything?
From my testing right now if a bucket doesn't have any documents or if the reduce script fails to return anything the value that comes back in the response is null.

That's fine until I try to do something like run a stats_bucket aggregation against these buckets, it seems to choke on the null value regardless of the gap_policy I set.

Assume in the reduce script above that if there are no documents we return null, or just not return at all.

Then, try adding the following aggregation

"measure stats" : {
      "stats_bucket" : {
        "buckets_path" : [
          "a>measure>profit.value"
        ],
        "gap_policy" : "skip"
      }
    }

I see responses like

"buckets_path must reference either a number value or a single value numeric metric aggregation"

I assume that's because of null values.. any way around this?

So I attempted to solve this by using the bucket selector aggregation. Specifically I have the following:

{
  "size": 0,
  "aggs": {
    "a": {
      "terms": {
        "field": "id",
        "size": 10
      },
      "aggs": {
        "measure" : {
          "filter" : {
            "bool" : {
              "must" : [
                {
                  "terms" : {
                    "foo.bar" : [
                      "foo"
                    ],
                    "boost" : 1.0
                  }
                },
                {
                  "terms" : {
                    "foo.baz" : [
                      "bar"
                    ],
                    "boost" : 1.0
                  }
                }
              ]
            }
          },
          "aggregations" : {
           "profit filter": {
                "bucket_selector": {
    "buckets_path": {
        "val": "profit.value"
    },
    "script": "params.val != null"
}
           },
           "profit": {
               "scripted_metric": {
                "init_script" : "params._agg.transactions = []",
                "map_script" : "params._agg.transactions.add(1)", 
                "combine_script" : "double profit = 0; for (t in params._agg.transactions) { profit += t } return profit",
                "reduce_script" : "double profit = 0; for (a in params._aggs) { if (a == null) {return null } profit += a } return profit"
                }
             }
          }
        }
      }
    }
  }
}

But this gave me basically the same error

"caused_by": {
      "type": "aggregation_execution_exception",
      "reason": "buckets_path must reference either a number value or a single value numeric metric aggregation"
    }

Help...

The reason this doesn't work is that the scripted_metric aggregation is not a "numeric metric aggregation" because it can return any valid JSON type (e.g. string, number, object, array, etc.) so it cannot be referenced in a pipeline aggregation. You will instead need to consume all the buckets for your terms aggregation in your client application and perform the stats operations on the values from your scripted_metric aggregation there.

P.S I am on vacation for a few weeks as of tomorrow so I probably won't be able to reply to any other questions you have until I am back, but I am happy to continue helping you when I am back. Also, others on this forum may be able to help you in the mean time. Thanks for your patience.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.