org.elasticsearch.action.get.MultiGetShardResponse extremly high "[took]" value

So we have some problems that something in our environment is causing our nodes to run out of HEAP memory (30GB per node)

While looking at this I found these messages:
server es_server[1635]: {"type": "server", "timestamp": "2022-01-14T14:09:25,148Z", "level": "WARN", "component": "o.e.t.OutboundHandler", "cluster.name": "cluster", "node.name": "cluster_nodes", "message": "sending transport message [MessageSerializer{Response{137295044}{false}{false}{false}{class org.elasticsearch.action.get.MultiGetShardResponse}}] of size [-1] on [Netty4TcpChannel{localAddress=/x.x.x.x:9300, remoteAddress=/x.x.x.x:45408, profile=default}] took [3965304145ms] which is above the warn threshold of [5000ms]", "cluster.uuid": "CLUSTERUUID", "node.id": "NODEUUID" }

So this value equals roughly 45 days, the cluster has a uptime of 23 days.
This is running on basic license ES 7.13.1

I know it is not a lot to go on, but has anyone seen this behaviour?

This seems to mean that the message was released due to a failure before it was sent, so the startTime and size fields contained junk, which looks possible in this version if the node was shutting down at about the same time. In newer versions #72442 should prevent this I think.

Thank you David!

I will look into if we can upgrade, from my understanding this is mainly a cosmetic issue in our case.

That's my understanding too, yes.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.