Null_pointer_exception: Cannot invoke "String.equals(Object)" because the return value of "org.apache.lucene.search.SortField.getField()" is null

This is on Elasticsearch 8.6.2. I have a pretty mundane query that I want to paginate via search_after. The initial query looks like this:

{
  "_source": true, 
  "collapse": {
    "field": "collapse_col"
  }, 
  "query": {
    "bool": {
      "filter": [
        {"term": {"array_col.attr_1": 12345}}
      ]
    }
  }, 
  "size": 10000, 
  "sort": ["_doc"]
  }

The first, or the first few, requests in the pagination will return OK. But at some point, the query fails:

{
  "_source": true, 
  "collapse": {
    "field": "collapse_col"
  }, 
  "query": {
    "bool": {
      "filter": [
        {"term": {"array_col.attr_1": 12345}}
      ]
    }
  },
  "search_after": [1311485], 
  "size": 10000, 
  "sort": ["_doc"]
  }

The result looks like:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "null_pointer_exception",
        "reason" : "Cannot invoke \"String.equals(Object)\" because the return value of \"org.apache.lucene.search.SortField.getField()\" is null"
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "indexname_2023-03-07-14-12-54",
        "node" : "node_id",
        "reason" : {
          "type" : "null_pointer_exception",
          "reason" : "Cannot invoke \"String.equals(Object)\" because the return value of \"org.apache.lucene.search.SortField.getField()\" is null"
        }
      }
    ],
    "caused_by" : {
      "type" : "null_pointer_exception",
      "reason" : "Cannot invoke \"String.equals(Object)\" because the return value of \"org.apache.lucene.search.SortField.getField()\" is null",
      "caused_by" : {
        "type" : "null_pointer_exception",
        "reason" : "Cannot invoke \"String.equals(Object)\" because the return value of \"org.apache.lucene.search.SortField.getField()\" is null"
      }
    }
  },
  "status" : 500
}

From one of the coordinating nodes, we had a log like this:

[2023-03-07T20:10:12,643][WARN ][r.suppressed             ] [node-name] path: /index_name/_search, params: {index=index_name}
org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:728) ~[elasticsearch-8.6.2.jar:?]
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:418) ~[elasticsearch-8.6.2.jar:?]
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:760) ~[elasticsearch-8.6.2.jar:?]
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:512) ~[elasticsearch-8.6.2.jar:?]
	at org.elasticsearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:349) ~[elasticsearch-8.6.2.jar:?]
	at org.elasticsearch.action.ActionListener$Delegating.onFailure(ActionListener.java:92) ~[elasticsearch-8.6.2.jar:?]
	at org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:48) ~[elasticsearch-8.6.2.jar:?]
	at org.elasticsearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:642) ~[elasticsearch-8.6.2.jar:?]
	at org.elasticsearch.transport.TransportService$UnregisterChildTransportResponseHandler.handleException(TransportService.java:1646) ~[elasticsearch-8.6.2.jar:?]
	at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1372) ~[elasticsearch-8.6.2.jar:?]
	at org.elasticsearch.transport.InboundHandler.doHandleException(InboundHandler.java:410) ~[elasticsearch-8.6.2.jar:?]
	at org.elasticsearch.transport.InboundHandler.handleException(InboundHandler.java:397) ~[elasticsearch-8.6.2.jar:?]
	at org.elasticsearch.transport.InboundHandler.handlerResponseError(InboundHandler.java:388) ~[elasticsearch-8.6.2.jar:?]
	at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:141) ~[elasticsearch-8.6.2.jar:?]
	at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:95) ~[elasticsearch-8.6.2.jar:?]
	at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:808) ~[elasticsearch-8.6.2.jar:?]
	at org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:149) ~[elasticsearch-8.6.2.jar:?]
	at org.elasticsearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:121) ~[elasticsearch-8.6.2.jar:?]
	at org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:86) ~[elasticsearch-8.6.2.jar:?]
	at org.elasticsearch.transport.netty4.Netty4MessageInboundHandler.channelRead(Netty4MessageInboundHandler.java:63) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[?:?]
	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:280) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[?:?]
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[?:?]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[?:?]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[?:?]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) ~[?:?]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[?:?]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
	at java.lang.Thread.run(Thread.java:1589) ~[?:?]
Caused by: org.elasticsearch.ElasticsearchException$1: Cannot invoke "String.equals(Object)" because the return value of "org.apache.lucene.search.SortField.getField()" is null
	at org.elasticsearch.ElasticsearchException.guessRootCauses(ElasticsearchException.java:640) ~[elasticsearch-8.6.2.jar:?]
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:416) ~[elasticsearch-8.6.2.jar:?]
	... 41 more
Caused by: java.lang.NullPointerException: Cannot invoke "String.equals(Object)" because the return value of "org.apache.lucene.search.SortField.getField()" is null
	at org.elasticsearch.search.searchafter.SearchAfterBuilder.buildFieldDoc(SearchAfterBuilder.java:104) ~[elasticsearch-8.6.2.jar:?]
	at org.elasticsearch.search.SearchService.parseSource(SearchService.java:1335) ~[elasticsearch-8.6.2.jar:?]
	at org.elasticsearch.search.SearchService.createContext(SearchService.java:986) ~[elasticsearch-8.6.2.jar:?]
	at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:630) ~[elasticsearch-8.6.2.jar:?]
	at org.elasticsearch.search.SearchService.lambda$executeQueryPhase$2(SearchService.java:495) ~[elasticsearch-8.6.2.jar:?]
	at org.elasticsearch.action.ActionRunnable$2.accept(ActionRunnable.java:50) ~[elasticsearch-8.6.2.jar:?]
	at org.elasticsearch.action.ActionRunnable$2.accept(ActionRunnable.java:47) ~[elasticsearch-8.6.2.jar:?]
	at org.elasticsearch.action.ActionRunnable$3.doRun(ActionRunnable.java:72) ~[elasticsearch-8.6.2.jar:?]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-8.6.2.jar:?]
	at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33) ~[elasticsearch-8.6.2.jar:?]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:917) ~[elasticsearch-8.6.2.jar:?]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-8.6.2.jar:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
	at java.lang.Thread.run(Thread.java:1589) ~[?:?]

I think this is related to using a collapse on one column and search_after / sort on a different, second column, and especially if that second one was _doc.

If I sort/search_after on the collapse_col, things work fine. If I try to do a sort on ["collapse_col", "_doc"] I get Cannot use [collapse] in conjunction with [search_after] unless the search is sorted on the same field. Multiple sort fields are not allowed., which is expected. So maybe this is some bug where it is not getting caught that the collapse and sort fields are not the same and it doesn't detect this when the sort field is _doc?

This doesn't appear to be any sort of data corruption as I can trigger the null pointer by shrinking the size down to 1 and doing a search_after to fetch the second document. Adding the search_after causes the null pointer exception, the initial search with no search_after works fine.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.