Cluster reovery failed and data node is not reachability

hi,
I have run a Elasticsearch cluster in prd. it has 3 master node and 4 data node for version 2.1.1

it has a big index ,about 2.5TB and it only has 5 shards. The index is in translog state for a long time during index recovery.The JVM memory usage is very high, and then the GC is repeated, causing the data node to leave the cluster.

{data=false, master=true}], reason [failed to ping, tried [3] times, each with  maximum [30s] timeout]

and I tried to modify the parameter discovery.ze.fd.ping_timeout to 600s, but found the same error

{data=false, master=true}], reason [failed to ping, tried [3] times, each with  maximum [10m] timeout

Is there a way to recover ?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.