onQueryPhase listener org.elasticsearch.index.SearchSlowLog fails with NullpointerException

Within the last month we are seeing the following exceptions like the one below more and more often on one of our clusters.

[2021-10-12T12:38:45,944 UTC][WARN ][o.e.i.s.IndexShard       ] [10.180.253.11] [visit-global-longer1-w2021.41][1] onQueryPhase listener [org.elasticsearch.index.SearchSlowLog@857d0b9] failed
java.lang.NullPointerException: null
	at org.elasticsearch.search.aggregations.metrics.AbstractPercentilesAggregationBuilder.doXContentBody(AbstractPercentilesAggregationBuilder.java:340) ~[elasticsearch-7.10.0.jar:7.10.0]
	at org.elasticsearch.search.aggregations.support.ValuesSourceAggregationBuilder.internalXContent(ValuesSourceAggregationBuilder.java:414) ~[elasticsearch-7.10.0.jar:7.10.0]
	at org.elasticsearch.search.aggregations.AbstractAggregationBuilder.toXContent(AbstractAggregationBuilder.java:155) ~[elasticsearch-7.10.0.jar:7.10.0]
	at org.elasticsearch.search.aggregations.AggregatorFactories$Builder.toXContent(AggregatorFactories.java:458) ~[elasticsearch-7.10.0.jar:7.10.0]
	at org.elasticsearch.search.aggregations.AbstractAggregationBuilder.toXContent(AbstractAggregationBuilder.java:159) ~[elasticsearch-7.10.0.jar:7.10.0]
	at org.elasticsearch.search.aggregations.AggregatorFactories$Builder.toXContent(AggregatorFactories.java:458) ~[elasticsearch-7.10.0.jar:7.10.0]
	at org.elasticsearch.search.aggregations.AbstractAggregationBuilder.toXContent(AbstractAggregationBuilder.java:159) ~[elasticsearch-7.10.0.jar:7.10.0]
	at org.elasticsearch.search.aggregations.AggregatorFactories$Builder.toXContent(AggregatorFactories.java:458) ~[elasticsearch-7.10.0.jar:7.10.0]
	at org.elasticsearch.search.aggregations.AbstractAggregationBuilder.toXContent(AbstractAggregationBuilder.java:159) ~[elasticsearch-7.10.0.jar:7.10.0]
	at org.elasticsearch.search.aggregations.AggregatorFactories$Builder.toXContent(AggregatorFactories.java:458) ~[elasticsearch-7.10.0.jar:7.10.0]
	at org.elasticsearch.search.aggregations.AbstractAggregationBuilder.toXContent(AbstractAggregationBuilder.java:159) ~[elasticsearch-7.10.0.jar:7.10.0]
	at org.elasticsearch.search.aggregations.AggregatorFactories$Builder.toXContent(AggregatorFactories.java:458) ~[elasticsearch-7.10.0.jar:7.10.0]
	at org.elasticsearch.search.aggregations.AbstractAggregationBuilder.toXContent(AbstractAggregationBuilder.java:159) ~[elasticsearch-7.10.0.jar:7.10.0]
	at org.elasticsearch.search.aggregations.AggregatorFactories$Builder.toXContent(AggregatorFactories.java:458) ~[elasticsearch-7.10.0.jar:7.10.0]
	at org.elasticsearch.search.aggregations.AbstractAggregationBuilder.toXContent(AbstractAggregationBuilder.java:159) ~[elasticsearch-7.10.0.jar:7.10.0]
	at org.elasticsearch.search.aggregations.AggregatorFactories$Builder.toXContent(AggregatorFactories.java:458) ~[elasticsearch-7.10.0.jar:7.10.0]
	at org.elasticsearch.common.xcontent.XContentBuilder.value(XContentBuilder.java:859) ~[elasticsearch-x-content-7.10.0.jar:7.10.0]
	at org.elasticsearch.common.xcontent.XContentBuilder.value(XContentBuilder.java:852) ~[elasticsearch-x-content-7.10.0.jar:7.10.0]
	at org.elasticsearch.common.xcontent.XContentBuilder.field(XContentBuilder.java:844) ~[elasticsearch-x-content-7.10.0.jar:7.10.0]
	at org.elasticsearch.search.builder.SearchSourceBuilder.innerToXContent(SearchSourceBuilder.java:1389) ~[elasticsearch-7.10.0.jar:7.10.0]
	at org.elasticsearch.search.builder.SearchSourceBuilder.toXContent(SearchSourceBuilder.java:1432) ~[elasticsearch-7.10.0.jar:7.10.0]
	at org.elasticsearch.common.xcontent.XContentHelper.toXContent(XContentHelper.java:408) ~[elasticsearch-7.10.0.jar:7.10.0]
	at org.elasticsearch.search.builder.SearchSourceBuilder.toString(SearchSourceBuilder.java:1689) ~[elasticsearch-7.10.0.jar:7.10.0]
	at org.elasticsearch.index.SearchSlowLog$SearchSlowLogMessage.prepareMap(SearchSlowLog.java:182) ~[elasticsearch-7.10.0.jar:7.10.0]
	at org.elasticsearch.index.SearchSlowLog$SearchSlowLogMessage.<init>(SearchSlowLog.java:161) ~[elasticsearch-7.10.0.jar:7.10.0]
	at org.elasticsearch.index.SearchSlowLog.onQueryPhase(SearchSlowLog.java:141) ~[elasticsearch-7.10.0.jar:7.10.0]
	at org.elasticsearch.index.shard.SearchOperationListener$CompositeListener.onQueryPhase(SearchOperationListener.java:156) ~[elasticsearch-7.10.0.jar:7.10.0]
	at org.elasticsearch.search.SearchService$SearchOperationListenerExecutor.close(SearchService.java:1336) ~[elasticsearch-7.10.0.jar:7.10.0]
	at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:436) ~[elasticsearch-7.10.0.jar:7.10.0]
	at org.elasticsearch.search.SearchService.access$500(SearchService.java:141) ~[elasticsearch-7.10.0.jar:7.10.0]
	at org.elasticsearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:401) ~[elasticsearch-7.10.0.jar:7.10.0]
	at org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:58) ~[elasticsearch-7.10.0.jar:7.10.0]
	at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:73) [elasticsearch-7.10.0.jar:7.10.0]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.10.0.jar:7.10.0]
	at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:44) [elasticsearch-7.10.0.jar:7.10.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:737) [elasticsearch-7.10.0.jar:7.10.0]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.10.0.jar:7.10.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_292]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_292]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]
	Suppressed: java.lang.IllegalStateException: Failed to close the XContentBuilder
		at org.elasticsearch.common.xcontent.XContentBuilder.close(XContentBuilder.java:1017) ~[elasticsearch-x-content-7.10.0.jar:7.10.0]
		at org.elasticsearch.common.xcontent.XContentHelper.toXContent(XContentHelper.java:403) ~[elasticsearch-7.10.0.jar:7.10.0]
		at org.elasticsearch.search.builder.SearchSourceBuilder.toString(SearchSourceBuilder.java:1689) ~[elasticsearch-7.10.0.jar:7.10.0]
		at org.elasticsearch.index.SearchSlowLog$SearchSlowLogMessage.prepareMap(SearchSlowLog.java:182) ~[elasticsearch-7.10.0.jar:7.10.0]
		at org.elasticsearch.index.SearchSlowLog$SearchSlowLogMessage.<init>(SearchSlowLog.java:161) ~[elasticsearch-7.10.0.jar:7.10.0]
		at org.elasticsearch.index.SearchSlowLog.onQueryPhase(SearchSlowLog.java:141) ~[elasticsearch-7.10.0.jar:7.10.0]
		at org.elasticsearch.index.shard.SearchOperationListener$CompositeListener.onQueryPhase(SearchOperationListener.java:156) ~[elasticsearch-7.10.0.jar:7.10.0]
		at org.elasticsearch.search.SearchService$SearchOperationListenerExecutor.close(SearchService.java:1336) ~[elasticsearch-7.10.0.jar:7.10.0]
		at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:436) ~[elasticsearch-7.10.0.jar:7.10.0]
		at org.elasticsearch.search.SearchService.access$500(SearchService.java:141) ~[elasticsearch-7.10.0.jar:7.10.0]
		at org.elasticsearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:401) ~[elasticsearch-7.10.0.jar:7.10.0]
		at org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:58) ~[elasticsearch-7.10.0.jar:7.10.0]
		at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:73) [elasticsearch-7.10.0.jar:7.10.0]
		at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.10.0.jar:7.10.0]
		at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:44) [elasticsearch-7.10.0.jar:7.10.0]
		at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:737) [elasticsearch-7.10.0.jar:7.10.0]
		at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.10.0.jar:7.10.0]
		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_292]
		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_292]
		at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]
	Caused by: java.io.IOException: Unclosed object or array found
		at org.elasticsearch.common.xcontent.json.JsonXContentGenerator.close(JsonXContentGenerator.java:456) ~[elasticsearch-x-content-7.10.0.jar:7.10.0]
		at org.elasticsearch.common.xcontent.XContentBuilder.close(XContentBuilder.java:1015) ~[elasticsearch-x-content-7.10.0.jar:7.10.0]
		... 19 more

It looks for me as if something is going wrong on Elasticsearch when it is trying to produce a log message.

Can somebody explain what is going on here. Since this is a warning we are not concerned yet. Should we?

The first thing to do is to upgrade to the latest version, which is 7.15.0.
This NPE might have been fixed in the meantime.

This exception is happening on only one deployment of our 100+ customers, but there we see it repeatedly. We were also unable to reproduce this exception locally. Our upgrade policy dictates that Elasticsearch upgrades are done for all customers at once.

A story for upgrading ES is already in our backlog but priority is quite low. Of course, if this could cause severe issues that can be fixed by upgrading we are open to reprioritize. But if it is just about a missed log line due to a NPE while logging, it is not worth the hassle.

For the moment I am just trying to find out if this might cause problems in the long run, or if it can wait.

I think that the end user who is running the query on visit-global-longer1-w2021.41 is seeing this NPE.
I don't think it causes any damage on the cluster.

As long as you accept that the end user have this kind of error, you can stay on that version IMHO.