Reading scroll request IDs leads to OOB access

Hello,

I'm using a cluster with 3 ELK nodes with 7.10.1 version installed in it. I know that it's a bit old, but any update usually brings much pain with my data, so I dont touch it while it works fine.

Today I've noticed in ES logs the following error:

Summary
[2024-11-05T00:03:26,681][WARN ][r.suppressed             ] [elk2] path: /.kibana_task_manager/_update_by_query, params: {ignore_unavailable=true, refresh=true, conflicts=proceed, index=.kibana_task_manager, max_docs=10}
org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:568) [elasticsearch-7.10.1.jar:7.10.1]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:324) [elasticsearch-7.10.1.jar:7.10.1]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:603) [elasticsearch-7.10.1.jar:7.10.1]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:400) [elasticsearch-7.10.1.jar:7.10.1]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.access$100(AbstractSearchAsyncAction.java:70) [elasticsearch-7.10.1.jar:7.10.1]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:258) [elasticsearch-7.10.1.jar:7.10.1]
        at org.elasticsearch.action.search.SearchExecutionStatsCollector.onFailure(SearchExecutionStatsCollector.java:73) [elasticsearch-7.10.1.jar:7.10.1]
        at org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:59) [elasticsearch-7.10.1.jar:7.10.1]
        at org.elasticsearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:408) [elasticsearch-7.10.1.jar:7.10.1]
        at org.elasticsearch.transport.TransportService$6.handleException(TransportService.java:640) [elasticsearch-7.10.1.jar:7.10.1]
        at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1181) [elasticsearch-7.10.1.jar:7.10.1]
        at org.elasticsearch.transport.InboundHandler.lambda$handleException$3(InboundHandler.java:277) [elasticsearch-7.10.1.jar:7.10.1]
        at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:224) [elasticsearch-7.10.1.jar:7.10.1]
        at org.elasticsearch.transport.InboundHandler.handleException(InboundHandler.java:275) [elasticsearch-7.10.1.jar:7.10.1]
        at org.elasticsearch.transport.InboundHandler.handlerResponseError(InboundHandler.java:267) [elasticsearch-7.10.1.jar:7.10.1]
        at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:131) [elasticsearch-7.10.1.jar:7.10.1]
        at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:89) [elasticsearch-7.10.1.jar:7.10.1]
        at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:700) [elasticsearch-7.10.1.jar:7.10.1]
        at org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:142) [elasticsearch-7.10.1.jar:7.10.1]
        at org.elasticsearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:117) [elasticsearch-7.10.1.jar:7.10.1]
        at org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:82) [elasticsearch-7.10.1.jar:7.10.1]
        at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:74) [transport-netty4-client-7.10.1.jar:7.10.1]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:271) [netty-handler-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1518) [netty-handler-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1267) [netty-handler-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1314) [netty-handler-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:501) [netty-codec-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:440) [netty-codec-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) [netty-codec-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:615) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:578) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-common-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.49.Final.jar:4.1.49.Final]
        at java.lang.Thread.run(Thread.java:832) [?:?]
Caused by: org.elasticsearch.ElasticsearchException: Trying to create too many scroll contexts. Must be less than or equal to: [500]. This limit can be set by changing the [search.max_open_scroll_context] setting.
        at org.elasticsearch.search.SearchService.createAndPutReaderContext(SearchService.java:649) ~[elasticsearch-7.10.1.jar:7.10.1]
        at org.elasticsearch.search.SearchService.createOrGetReaderContext(SearchService.java:633) ~[elasticsearch-7.10.1.jar:7.10.1]
        at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:426) ~[elasticsearch-7.10.1.jar:7.10.1]
        at org.elasticsearch.search.SearchService.access$500(SearchService.java:141) ~[elasticsearch-7.10.1.jar:7.10.1]
        at org.elasticsearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:401) ~[elasticsearch-7.10.1.jar:7.10.1]
        at org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:58) ~[elasticsearch-7.10.1.jar:7.10.1]
        at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:73) ~[elasticsearch-7.10.1.jar:7.10.1]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:737) ~[elasticsearch-7.10.1.jar:7.10.1]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.10.1.jar:7.10.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) ~[?:?]
        at java.lang.Thread.run(Thread.java:832) ~[?:?]

In order to understand what generateed so much scroll requests I've tried to view the scroll requests with GET /_search/scroll/_all request. However, this request produced the weird error:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "security_exception",
        "reason" : "action [indices:data/read/scroll] is unauthorized for user [<redacted>]"
      }
    ],
    "type" : "security_exception",
    "reason" : "action [indices:data/read/scroll] is unauthorized for user [<redacted>]",
    "caused_by" : {
      "type" : "illegal_argument_exception",
      "reason" : "Cannot parse scroll id",
      "caused_by" : {
        "type" : "array_index_out_of_bounds_exception",
        "reason" : "arraycopy: last source index 1660160 out of bounds for byte[3]"
      }
    }
  },
  "status" : 403
}

I tried to following to solve the problem:

  1. Restart all ES instances in stack (didn't help)
  2. Execute DELETE _search/scroll/_all - it reported that 3 (?!) scroll requests has been freed and nothing really changed. I'm confused why it reported only 3 if ES in logs says that I have 500 scroll requests?! (didn't help either)

Anyone has any idea how I could fix the current state of ELK so that it continue to operate normally?

Regards

Are you using the default distribution with Elastic security or are you using the oss distribution with a third-party plugin for security?

I'm using default ES with basic license for x-pack (monitoring + basic auth + index level access).