ES Cluster Health in yellow due to some Replica shards in Unassigned state

We have elastic search Production cluster environment with 4 nodes

Node 1:
node.master: true
Node 2:
node.master: true
node.data: true
Node 3:
node.master: true
node.data: true
Node 4: Coordinating node
node.master: false
node.data: false
node.ingest: false

We are facing an issue of replica shards getting allocated on the Node 2.
No issues with the Primary shards in Node 3 which is the current Master node

Logs from Node 2 is as below:

[2018-11-22T16:42:48,849][INFO ][o.e.d.z.ZenDiscovery ] [node-plmspapgs0g] master_left [{node-plmspapgs0h}{ADnQ6UDZRoOFFvnkcdh3Sw}{R17y6k_ZTyOTzIby9xlTIw}{X.X.X.X}}{X.X.X.X}:9300}{ml.machine_memory=33559379968, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}], reason [failed to ping, tried [3] times, each with maximum [30s] timeout]
[2018-11-22T16:42:48,850][WARN ][o.e.d.z.ZenDiscovery ] [node-plmspapgs0g] master left (reason = failed to ping, tried [3] times, each with maximum [30s] timeout), current nodes: nodes:
{node-plmspapgs0i}{31N0LIJiRtG3kfWsSoD2Iw}{RVg25n_5TomjJpi307wBVQ}{X.X.X.X}{X.X.X.X}:9300}{ml.machine_memory=16649068544, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}
{node-plmspapgs0e}{z49EVuIYQpOERYIYC2ZELA}{5WJEVl-NRtGH7A7qYUBxDg}{X.X.X.X}}{X.X.X.X}:9300}{ml.machine_memory=33559379968, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}
{node-plmspapgs0h}{ADnQ6UDZRoOFFvnkcdh3Sw}{R17y6k_ZTyOTzIby9xlTIw}{X.X.X.X}}{X.X.X.X}:9300}{ml.machine_memory=33559379968, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}, master
{node-plmspapgs0g}{ffFONm7QQRqTkB8cH2DlVg}{wB9thnAdSE256v41YipqTA}{X.X.X.X}}{X.X.X.X}:9300}{ml.machine_memory=33559379968, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}, local

[2018-11-22T16:42:50,007][WARN ][r.suppressed ] path: /_xpack/monitoring/_bulk, params: {system_id=beats, system_api_version=6, interval=10s}
org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/2/no master];
at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:166) ~[elasticsearch-6.4.0.jar:6.4.0]
at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:152) ~[elasticsearch-6.4.0.jar:6.4.0]
at org.elasticsearch.xpack.monitoring.action.TransportMonitoringBulkAction.doExecute(TransportMonitoringBulkAction.java:56) ~[?:?]
at org.elasticsearch.xpack.monitoring.action.TransportMonitoringBulkAction.doExecute(TransportMonitoringBulkAction.java:36) ~[?:?]
at org.elasticsearch.action.support.TransportAction.doExecute(TransportAction.java:143) ~[elasticsearch-6.4.0.jar:6.4.0]
at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:167) ~[elasticsearch-6.4.0.jar:6.4.0]
at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.apply(SecurityActionFilter.java:128) ~[?:?]

As a workaround:

After deleting all the indexes cluster health is in Green for the first 4-5 days.
The cluster turned to Yellow because of replicas not being allocated to Node 2

Please try to help as we are facing this issue for very long time.

What is your minimum masters setting?
Is it the same on all nodes?

The logs indicate that Node 2 cannot find a master node, do you have any networking issues between Node 2 and the other nodes or generally?

Do you have any shard allocation awareness in play?
https://www.elastic.co/guide/en/elasticsearch/reference/current/allocation-awareness.html

When you see the cluster go yellow can you provide an Allocation Explain output?
GET /_cluster/allocation/explain

What replica settings do you have on the indices that go yellow when losing 1 of 3 data nodes in the cluster?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.