Seeing performance issues and time outs in elasticsearch but not finding much useful in the logs

We have been having issues lately with our production cluster. All of our grafana alerts will trigger with an error like so:

Error = [sse.dataQueryError] failed to execute query [A]: Post "https://elastic.prod2-obsv-eastus.domain.com:9200/_msearch?max_concurrent_shard_requests=5": context deadline exceeded

and queries will stop working in elasticsearch during this time. we also see blocks of time in kibana where there are gaps for metrics

looking through the logs there isn't anything super helpful that I've found so far. I did find these logs on one of our master nodes that is not currently the master:

{"@timestamp":"2024-06-03T15:00:54.430Z", "log.level": "WARN", "message":"health check of [/usr/share/elasticsearch/data] took [5402ms] which is above the warn threshold of [5s]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[prod2-elasticsearch-data-0][generic][T#73]","log.logger":"org.elasticsearch.monitor.fs.FsHealthService","elasticsearch.cluster.uuid":"owBcUi5vSDORDcxfpHp3xw","elasticsearch.node.id":"-jdGcGXhSzeJUhtKgIhwzQ","elasticsearch.node.name":"prod2-elasticsearch-data-0","elasticsearch.cluster.name":"elasticsearch"}
{"@timestamp":"2024-06-03T13:54:42.658Z", "log.level": "WARN", "message":"caught exception while handling client http traffic, closing connection Netty4HttpChannel{localAddress=/172.19.56.147:9200, remoteAddress=/172.19.0.33:41099}", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[prod2-elasticsearch-data-0][transport_worker][T#4]","log.logger":"org.elasticsearch.http.AbstractHttpServerTransport","elasticsearch.cluster.uuid":"owBcUi5vSDORDcxfpHp3xw","elasticsearch.node.id":"-jdGcGXhSzeJUhtKgIhwzQ","elasticsearch.node.name":"prod2-elasticsearch-data-0","elasticsearch.cluster.name":"elasticsearch","error.type":"io.netty.handler.codec.PrematureChannelClosureException","error.message":"Channel closed while still aggregating message","error.stack_trace":"io.netty.handler.codec.PrematureChannelClosureException: Channel closed while still aggregating message\n\tat io.netty.codec@4.1.94.Final/io.netty.handler.codec.MessageAggregator.channelInactive(MessageAggregator.java:436)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:305)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:274)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81)\n\tat io.netty.codec.http@4.1.94.Final/io.netty.handler.codec.http.HttpContentDecoder.channelInactive(HttpContentDecoder.java:235)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:305)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:274)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81)\n\tat org.elasticsearch.transport.netty4@8.12.0/org.elasticsearch.http.netty4.Netty4HttpHeaderValidator.channelInactive(Netty4HttpHeaderValidator.java:186)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:305)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:274)\n\tat io.netty.codec@4.1.94.Final/io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:411)\n\tat io.netty.codec@4.1.94.Final/io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:376)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:305)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:274)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81)\n\tat org.elasticsearch.transport.netty4@8.12.0/org.elasticsearch.transport.netty4.Netty4WriteThrottlingHandler.channelInactive(Netty4WriteThrottlingHandler.java:109)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:303)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:274)\n\tat io.netty.codec@4.1.94.Final/io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:411)\n\tat io.netty.codec@4.1.94.Final/io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:376)\n\tat io.netty.handler@4.1.94.Final/io.netty.handler.ssl.SslHandler.channelInactive(SslHandler.java:1085)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:305)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:274)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:301)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:813)\n\tat io.netty.common@4.1.94.Final/io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)\n\tat io.netty.common@4.1.94.Final/io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)\n\tat io.netty.common@4.1.94.Final/io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:566)\n\tat io.netty.common@4.1.94.Final/io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)\n\tat io.netty.common@4.1.94.Final/io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat java.base/java.lang.Thread.run(Thread.java:1583)\n"}

could either of these be related to what we are seeing? is there anything else I could be looking for that might be causing this?

As far as I've seen the cluster is staying green the whole time.