Failed to retrieve shard stats from node

I am running es-cluster on kubernetes on secure mode(Basic security) . I am getting below warning on master es node logs. why am i getting this error during benchmarking with esrally.

i have 1 master and a data node running on different host.

{"type": "server", "timestamp": "2022-05-09T06:11:20,824Z", "level": "WARN", "component": "o.e.c.InternalClusterInfoService", "cluster.name": "elasticsearch", "node.name": "es-master", "message": "failed to retrieve shard stats from node [9_L5bY1kQA6s04k23DXPfQ]: [es-data][10.244.172.223:9300][indices:monitor/stats[n]] request_id [65720] timed out after [15006ms]", "cluster.uuid": "wBHxBmZ2SbClTiJ5dykWjQ", "node.id": "Sq7N0x61T2W6o5ZDr8KkVg"  }

{"type": "server", "timestamp": "2022-05-09T06:11:20,824Z", "level": "WARN", "component": "o.e.c.InternalClusterInfoService", "cluster.name": "elasticsearch", "node.name": "es-master", "message": "failed to retrieve shard stats from node [9_L5bY1kQA6s04k23DXPfQ]: [es-data][10.244.172.223:9300][indices:monitor/stats[n]] request_id [65720] timed out after [15006ms]", "cluster.uuid": "wBHxBmZ2SbClTiJ5dykWjQ", "node.id": "Sq7N0x61T2W6o5ZDr8KkVg"  }

As shown in the message, It appears that the request for get stats from the node es-data (on IP address: 10.244.172.223) was very slow processing a shard stats request and the request timed out after 15seconds.

Are you monitoring the resource usage of your cluster (and Rally :slight_smile: ) while running the benchmark? This warning indicates that this node is experiencing high load. You should also check the load of your master node(s). Hitting the nodes too hard is one of the deadly sins of benchmarking, I strongly recommend watching the recording (link to slides here).

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.