Hi everyone,
I am trying to reindex my data and the ES Cluster turned to RED.
The error in master node's log:
[2019-10-23T16:24:00,179][ERROR][o.e.x.m.c.c.ClusterStatsCollector] [node2] collector [cluster_stats] timed out when collecting data
[2019-10-23T16:24:12,903][WARN ][o.e.c.InternalClusterInfoService] [node2] Failed to update shard information for ClusterInfoUpdateJob within 15s timeout
[2019-10-23T16:24:40,153][INFO ][o.e.c.s.MasterService ] [node2] zen-disco-node-failed({node5}{SPKWfjBDS3OZx89CCGIMWA}{JMzdy2BNSAGugxDuGIMX8A}{172.16.3.84}{172.16.3.84:9300}{ml.machine_memory=67436204032, disk=normal, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}), reason(failed to ping, tried [3] times, each with maximum [30s] timeout)[{node5}{SPKWfjBDS3OZx89CCGIMWA}{JMzdy2BNSAGugxDuGIMX8A}{172.16.3.84}{172.16.3.84:9300}{ml.machine_memory=67436204032, disk=normal, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true} failed to ping, tried [3] times, each with maximum [30s] timeout], reason: removed {{node5}{SPKWfjBDS3OZx89CCGIMWA}{JMzdy2BNSAGugxDuGIMX8A}{172.16.3.84}{172.16.3.84:9300}{ml.machine_memory=67436204032, disk=normal, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},}
[2019-10-23T16:24:46,960][INFO ][o.e.c.s.ClusterApplierService] [node2] removed {{node5}{SPKWfjBDS3OZx89CCGIMWA}{JMzdy2BNSAGugxDuGIMX8A}{172.16.3.84}{172.16.3.84:9300}{ml.machine_memory=67436204032, disk=normal, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},}, reason: apply cluster state (from master [master {node2}{Gy_SbbWKTSS213NYmAshsQ}{wJwenmHLTme7fH9fGrSJdA}{172.16.30.92}{172.16.30.92:9300}{ml.machine_memory=33437806592, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true} committed version [3320] source [zen-disco-node-failed({node5}{SPKWfjBDS3OZx89CCGIMWA}{JMzdy2BNSAGugxDuGIMX8A}{172.16.3.84}{172.16.3.84:9300}{ml.machine_memory=67436204032, disk=normal, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}), reason(failed to ping, tried [3] times, each with maximum [30s] timeout)[{node5}{SPKWfjBDS3OZx89CCGIMWA}{JMzdy2BNSAGugxDuGIMX8A}{172.16.3.84}{172.16.3.84:9300}{ml.machine_memory=67436204032, disk=normal, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true} failed to ping, tried [3] times, each with maximum [30s] timeout]]])
[2019-10-23T16:24:57,903][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [node2] failed to execute on node [SPKWfjBDS3OZx89CCGIMWA]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [node5][172.16.3.84:9300][cluster:monitor/nodes/stats[n]] request_id [1114453] timed out after [15006ms]
My Elasticsearch version is 6.8.0.
I have 5 nodes:
node2 have total 31G memory and 15G for elasticsearch.
node3 and node4 have total 30G memory and 15G for elasticsearch, node3 and node4 have SSD.
node5 and node6 have total 64G memory and 30G for elasticsearch.
node2 is master node.
The cluster have 1431 indices, 3340 primary shards and 1763 replica shards.