Timeout in Elastic cluster:monitor/nodes/stats

Hello - we are getting a Timeout calling the /_cat/indexes API - though we're able to call all other APIs ok. The exception we're getting is a cluster:monitor/nodes/stats Timeout in the logs. How do we debug this to see where the problem could lie?

[2018-05-23T18:29:47,964][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [ip-10-111-111-1] failed to execute on node [avcd]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [ip-10-211-211-1][10.200.187.133:9300][cluster:monitor/nodes/stats[n]] request_id [4088993] timed out after [15000ms]

PS : Would individually restarting each node in the aws cluster INCLUDING the Coordinating Node help?

This is our cluster information

{
    "cluster_name": "prod-application-cluster",
    "status": "green",
    "timed_out": false,
    "number_of_nodes": 7,
    "number_of_data_nodes": 5,
    "active_primary_shards": 281,
    "active_shards": 562,
    "relocating_shards": 0,
    "initializing_shards": 0,
    "unassigned_shards": 0,
    "delayed_unassigned_shards": 0,
    "number_of_pending_tasks": 0,
    "number_of_in_flight_fetch": 0,
    "task_max_waiting_in_queue_millis": 0,
    "active_shards_percent_as_number": 100
}

What is the specification of the hardware the cluster is using?

5 m5.4xlarge Data/Master Nodes !
2 m5 large Coodinating Nodes!

PS : We've been using this successfully for the past 6 months - this is the first time this has ever happened

Are there any report about long GC in the Elasticsearch logs? Any errors reported?

Just the errors I posted in the OP. I didn't see any GC related errors.

Then I am not sure what could be going on...

Would restarting all the nodes individually help? Also - is the number of shards excessively high?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.