Update_by_query crashing w/large scroll_size

I have the following issue with update_by_query, it crashes when my scroll_size is large

  1. Using Elasticsearch 5.5.3
  2. My index has 1 shard
  3. I use this _update_by_query, I'm trying to add a username to _licensed_by
    POST /my_index/_update_by_query?conflicts=proceed 
    {
      "script": {
        "inline": "ctx._source._licensed_by.add(params.username)",
        "params": {
          "username": "test1"
        },
        "lang": "painless"
      }
    }
    
    

If I run this where scroll_size=100, it works
If I run this where scroll_size=500, it usually works
If I run this where scroll_size=1000 (the default), it fails

The error that is returned is:

{
"error": {
    "root_cause": [
        {
            "type": "search_context_missing_exception",
            "reason": "No search context found for id [81]"
        }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
        {
            "shard": -1,
            "index": null,
            "reason": {
                "type": "search_context_missing_exception",
                "reason": "No search context found for id [81]"
            }
        }
    ],
    "caused_by": {
        "type": "search_context_missing_exception",
        "reason": "No search context found for id [81]"
    }
},
"status": 404
}

Here are the errors in the Elasticsearch logs:

2019-05-21T18:27:52.109840745Z org.elasticsearch.search.SearchContextMissingException: No     search context found for id [82]
2019-05-21T18:27:52.109847142Z  at org.elasticsearch.search.SearchService.findContext(SearchService.java:443) ~[elasticsearch-5.5.3.jar:5.5.3]
2019-05-21T18:27:52.109859304Z  at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:379) ~[elasticsearch-5.5.3.jar:5.5.3]
2019-05-21T18:27:52.109864878Z  at org.elasticsearch.action.search.SearchTransportService$10.messageReceived(SearchTransportService.java:373) ~[elasticsearch-5.5.3.jar:5.5.3]
2019-05-21T18:27:52.109869594Z  at org.elasticsearch.action.search.SearchTransportService$10.messageReceived(SearchTransportService.java:370) ~[elasticsearch-5.5.3.jar:5.5.3]
2019-05-21T18:27:52.109900496Z  at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) ~[elasticsearch-5.5.3.jar:5.5.3]
2019-05-21T18:27:52.109906483Z  at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:644) [elasticsearch-5.5.3.jar:5.5.3]
2019-05-21T18:27:52.109918010Z  at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) [elasticsearch-5.5.3.jar:5.5.3]
2019-05-21T18:27:52.109924004Z  at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.5.3.jar:5.5.3]
2019-05-21T18:27:52.109928824Z  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_141]
2019-05-21T18:27:52.109933573Z  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_141]
2019-05-21T18:27:52.109938089Z  at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]
2019-05-21T18:27:52.110191968Z [2019-05-21T18:27:52,110][TRACE][o.e.t.T.tracer           ] [dev-app-server] [17064][indices:data/read/search[phase/query+fetch/scroll]] received response from [{dev-app-server}{O1ElqbrtR2OcFlaygS3ziw}{xl5lK9a4TC6ofSZ1-dzkkg}{172.25.0.2}{172.25.0.2:9300}]
2019-05-21T18:27:52.111560699Z [2019-05-21T18:27:52,110][WARN ][o.e.i.r.TransportUpdateByQueryAction] [dev-app-server] giving up on search because it failed with a non-retryable exception
2019-05-21T18:27:52.111574089Z org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed
2019-05-21T18:27:52.111579537Z  at org.elasticsearch.action.search.SearchScrollAsyncAction.onShardFailure(SearchScrollAsyncAction.java:213) [elasticsearch-5.5.3.jar:5.5.3]
2019-05-21T18:27:52.111584561Z  at org.elasticsearch.action.search.SearchScrollAsyncAction$1.onFailure(SearchScrollAsyncAction.java:141) [elasticsearch-5.5.3.jar:5.5.3]
2019-05-21T18:27:52.111589409Z  at org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:51) [elasticsearch-5.5.3.jar:5.5.3]
2019-05-21T18:27:52.111594349Z  at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1067) [elasticsearch-5.5.3.jar:5.5.3]
2019-05-21T18:27:52.111599361Z  at org.elasticsearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:1171) [elasticsearch-5.5.3.jar:5.5.3]
2019-05-21T18:27:52.111604231Z  at org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1149) [elasticsearch-5.5.3.jar:5.5.3]
2019-05-21T18:27:52.111617980Z  at org.elasticsearch.transport.TransportService$7.onFailure(TransportService.java:655) [elasticsearch-5.5.3.jar:5.5.3]
2019-05-21T18:27:52.111623361Z  at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.onFailure(ThreadContext.java:623) [elasticsearch-5.5.3.jar:5.5.3]
2019-05-21T18:27:52.111628210Z  at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:39) [elasticsearch-5.5.3.jar:5.5.3]
2019-05-21T18:27:52.111632949Z  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_141]
2019-05-21T18:27:52.111637431Z  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_141]
2019-05-21T18:27:52.111641975Z  at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]
2019-05-21T18:27:52.111652031Z Caused by: org.elasticsearch.transport.RemoteTransportException: [dev-app-server][172.25.0.2:9300][indices:data/read/search[phase/query+fetch/scroll]]
2019-05-21T18:27:52.111657336Z Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [82]
2019-05-21T18:27:52.111662827Z  at org.elasticsearch.search.SearchService.findContext(SearchService.java:443) ~[elasticsearch-5.5.3.jar:5.5.3]
2019-05-21T18:27:52.111667565Z  at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:379) ~[elasticsearch-5.5.3.jar:5.5.3]
Truncated due to size limitation

Any thoughts on this? What might be causing update_by_query to fail? What are all shards failing? What more can I do to troubleshoot?

I'm having a similar issue with update_by_query. What's the deal with this turd?

Hi @Bobby_Buten,

one possible reason for this could be if writing the updates into the index is slow. If writing 1000 documents take more than 5 minutes (which sounds very slow), the scroll query will time out. If you have DEBUG logging enabled, you should see a message saying something like: freeing search context... in the log file on the node holding the shard copy doing the query.

If this is the case, I think your investigation should center around why writing is slow.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.