Transform fails with null_pointer_exception for nested aggregation on adding a filter condition

gautu7 · September 20, 2018, 6:00am

I am trying to create a Watcher alert over monitoring indices. I want to aggregate over 'N' different clusters and output the nodes with it's cluster name if the disk utilization goes above a specified threshold %

	      "aggs": {
            "clusters": {
              "terms": {
                "field": "cluster_uuid",
                "size": 100
              },
			  "aggs": {
                "nodes": {
                  "terms": {
                    "field": "source_node.name",
                    "size": 100
                  },
                  "aggs": {
                    "total_in_bytes": {
                      "max": {
                        "field": "node_stats.fs.total.total_in_bytes"
                      }
                    },
                   "available_in_bytes": {
                      "max": {
                        "field": "node_stats.fs.total.available_in_bytes"
                      }
                    },
                   "free_ratio": {
                     "bucket_script": {
                       "buckets_path": {
                         "available_in_bytes": "available_in_bytes",
                         "total_in_bytes": "total_in_bytes"
                       },
                       "script": "params.available_in_bytes / params.total_in_bytes"
                     }
                   }
                 }
                }
			  }
			}  
          },

Here is my transform script which works but does not filter out the ones which don't meet the criteria like I have my condition setup on

"condition": {
    "script": {
    "lang": "painless",
    "source": "return ctx.payload.aggregations.clusters.buckets.stream().anyMatch(cluster -> cluster.nodes.buckets.stream().anyMatch(node -> node.free_ratio.value < ctx.metadata.lower_bound));"
	}
  },
  "transform": {
    "script": {
    "lang": "painless",
    "source": "def hosts = []; ctx.payload.aggregations.clusters.buckets.stream().forEach(cluster -> cluster.nodes.buckets.stream().forEach(node -> hosts.add(['cluster':cluster.key,'node':node.key,'available_in_gb':Math.round((node.available_in_bytes.value/1073741824) * 100)/100,'total_in_gb':Math.round((node.total_in_bytes.value/1073741824)* 100)/100])));return ['hosts': hosts]"
    }
  },

I want to filter out the nodes which are below the threshold so that they are not transformed and sent to the action block, adding a filter to the above does not work so, I tried something like this in one transform block

"source": "ctx.payload.aggregations.clusters.buckets.stream().forEach(cluster -> cluster.nodes.buckets.stream().filter(it -> it.free_ratio.value < ctx.metadata.lower_bound).map(it -> ['cluster_name':cluster.key,'node_name':it.key,'available_in_gb':Math.round((it.available_in_bytes.value/1073741824) * 100)/100,'total_in_gb':Math.round((it.total_in_bytes.value/1073741824)* 100)/100])).collect(Collectors.toList())"

but, get this exception

"script_stack": [

  "it -&gt; ['cluster_name':cluster.key,'node_name':it.key,'available_in_gb':Math.round((it.available_in_bytes.value/1073741824) * 100)/100,'total_in_gb':Math.round((it.total_in_bytes.value/1073741824)* 100)/100])).collect(Collectors.toList())",

  " ^---- HERE"

],

"script": "ctx.payload.aggregations.clusters.buckets.stream().forEach(cluster -&gt; cluster.nodes.buckets.stream().filter(it -&gt; it.free_ratio.value &lt; ctx.metadata.lower_bound).map(it -&gt; ['cluster_name':cluster.key,'node_name':it.key,'available_in_gb':Math.round((it.available_in_bytes.value/1073741824) * 100)/100,'total_in_gb':Math.round((it.total_in_bytes.value/1073741824)* 100)/100])).collect(Collectors.toList())",

"lang": "painless",

"caused_by": {

"type": "null_pointer_exception",

"reason": null

}

I also tried a chain transform which dint work. Is there an alternative way to filter or flatten this sub aggregation?

spinscale · September 20, 2018, 11:51am

having all those snippets makes reading pretty hard. It's much easier if you can provide all the information at once in a gist. All information is

the whole watch
the full output of the execute watch API or a watcher history entry
if there are any stack traces in the logfiles please include those as well

I am not sure on top of my head if .forEach() can actually collect data into a list or just returns void.A full stack trace (which should be part of the exeucte watch api or the watch history entry) should reveal more and also provide the possibility to reproduce this without needing any data.

--Alex

gautu7 · September 20, 2018, 2:34pm

@spinscale here is the watcher and it's execution

gist.github.com

https://gist.github.com/gautu7/c2b76b6a3c330f2ad769cc85fa7b41ef

fs_usage_watch.json

{
"watch" :{
  "trigger": {
    "schedule": {
      "interval": "5m"
    }
  },
  "metadata": {
    "window_period": "5m",
    "lower_bound": 1.0

This file has been truncated. show original

gist.github.com

https://gist.github.com/gautu7/68709a2d2a7de2572d298436ca0ff0f4

fs_usage_watcher_exec.json

{
"_id": "_inlined__1b1c676c-42df-4634-a341-6b611cef76af-2018-09-20T14:16:16.756Z",
"watch_record": {
"watch_id": "_inlined_",
"node": "fGDJy_GhRjqz_bkNdIuy3A",
"state": "executed",
"user": "elastic",
"status": {
"state": {
"active": true,

This file has been truncated. show original

gautu7 · September 20, 2018, 3:36pm

filter instead of forEach gives me this error

"transform": {
    "script": {
    "lang": "painless",
    "source": "ctx.payload.aggregations.clusters.buckets.stream().filter(cluster -> cluster.nodes.buckets.stream().filter(it -> it.free_ratio.value < ctx.metadata.lower_bound).map(it -> ['cluster_name':cluster.key,'node_name':it.key,'available_in_gb':Math.round((it.available_in_bytes.value/1073741824) * 100)/100,'total_in_gb':Math.round((it.total_in_bytes.value/1073741824)* 100)/100])).collect(Collectors.toList())"
    }
  }

Error

"type": "script_exception",

"reason": "runtime error",

"script_stack": [

  "java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174)",

  "java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)",

  "java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)",

  "java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)",

  "java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)",

  "java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)",

  "java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)",

  "it -&gt; ['cluster_name':cluster.key,'node_name':it.key,'available_in_gb':Math.round((it.available_in_bytes.value/1073741824) * 100)/100,'total_in_gb':Math.round((it.total_in_bytes.value/1073741824)* 100)/100])).collect(Collectors.toList())",

  " ^---- HERE"

],

"script": "ctx.payload.aggregations.clusters.buckets.stream().filter(cluster -&gt; cluster.nodes.buckets.stream().filter(it -&gt; it.free_ratio.value &lt; ctx.metadata.lower_bound).map(it -&gt; ['cluster_name':cluster.key,'node_name':it.key,'available_in_gb':Math.round((it.available_in_bytes.value/1073741824) * 100)/100,'total_in_gb':Math.round((it.total_in_bytes.value/1073741824)* 100)/100])).collect(Collectors.toList())",

"lang": "painless",

"caused_by": {

"type": "class_cast_exception",

"reason": "java.util.stream.ReferencePipeline$3 cannot be cast to java.lang.Number"

}

}

},

"actions": [],

},

"messages": [

  "failed to execute watch transform"

],

spinscale · September 21, 2018, 12:20pm

you can always step away from using lambda and just go with plain for loops for better readability, what I did here (I think there's a brakcet mistake somewhere, but it is hard to spot).

How about

   "transform": {
      "script": {
        "lang": "painless",
        "source": """
        def clusters = ctx.payload.aggregations.clusters.buckets;
        def data = [];
        for (def i=0 ; i < clusters.length ; i++) {
          def cluster = clusters[i];
          def buckets = cluster.nodes.buckets.stream()
            .filter(b -> b.free_ratio.value < ctx.metadata.lower_bound)
            .map(it -> ['cluster_name':cluster.key,'node_name':it.key])
            .collect(Collectors.toList());
            data.addAll(buckets);
        }
        return data
        """
      }
    },

gautu7 · September 21, 2018, 2:04pm

I got around it by doing something similar without having to loop through explicitly.

def hosts = []; ctx.payload.aggregations.clusters.buckets.stream().forEach(cluster -> cluster.nodes.buckets.stream().filter(node -> node.memory.value > ctx.metadata.jvm_usage_threshold).forEach(node -> hosts.add(['cluster':cluster.key,'node':node.key,'avg_jvm_usage':node.memory.value])));return ['hosts': hosts]

system · October 19, 2018, 2:04pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Nested Aggregation in a Transforms Script Elasticsearch elastic-stack-alerting	2	1038	April 20, 2018
Index_out_of_bounds_exception watcher using aggregration Elasticsearch elastic-stack-alerting	2	1218	April 17, 2019
Watcher Alert on Agg Field & Painless Script Condition Error Elastic Security	2	389	November 4, 2022
Array_Compare working with Nested Aggregations Elasticsearch elastic-stack-alerting	4	1296	September 5, 2018
Watcher - Write painless script for looping aggregated buckets Elasticsearch elastic-stack-alerting , painless	3	1082	January 18, 2023

Transform fails with null_pointer_exception for nested aggregation on adding a filter condition

Related topics