ES Cluster Health in yellow due to some Replica shards in Unassigned state


(Damvinod) #1

We have elastic search Production cluster environment with 4 nodes

Node 1:
node.master: true
Node 2:
node.master: true
node.data: true
Node 3:
node.master: true
node.data: true
Node 4: Coordinating node
node.master: false
node.data: false
node.ingest: false

We are facing an issue of replica shards getting allocated on the Node 2.
No issues with the Primary shards in Node 3 which is the current Master node

Logs from Node 2 is as below:

[2018-11-22T16:42:48,849][INFO ][o.e.d.z.ZenDiscovery ] [node-plmspapgs0g] master_left [{node-plmspapgs0h}{ADnQ6UDZRoOFFvnkcdh3Sw}{R17y6k_ZTyOTzIby9xlTIw}{X.X.X.X}}{X.X.X.X}:9300}{ml.machine_memory=33559379968, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}], reason [failed to ping, tried [3] times, each with maximum [30s] timeout]
[2018-11-22T16:42:48,850][WARN ][o.e.d.z.ZenDiscovery ] [node-plmspapgs0g] master left (reason = failed to ping, tried [3] times, each with maximum [30s] timeout), current nodes: nodes:
{node-plmspapgs0i}{31N0LIJiRtG3kfWsSoD2Iw}{RVg25n_5TomjJpi307wBVQ}{X.X.X.X}{X.X.X.X}:9300}{ml.machine_memory=16649068544, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}
{node-plmspapgs0e}{z49EVuIYQpOERYIYC2ZELA}{5WJEVl-NRtGH7A7qYUBxDg}{X.X.X.X}}{X.X.X.X}:9300}{ml.machine_memory=33559379968, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}
{node-plmspapgs0h}{ADnQ6UDZRoOFFvnkcdh3Sw}{R17y6k_ZTyOTzIby9xlTIw}{X.X.X.X}}{X.X.X.X}:9300}{ml.machine_memory=33559379968, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}, master
{node-plmspapgs0g}{ffFONm7QQRqTkB8cH2DlVg}{wB9thnAdSE256v41YipqTA}{X.X.X.X}}{X.X.X.X}:9300}{ml.machine_memory=33559379968, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}, local

[2018-11-22T16:42:50,007][WARN ][r.suppressed ] path: /_xpack/monitoring/_bulk, params: {system_id=beats, system_api_version=6, interval=10s}
org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/2/no master];
at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:166) ~[elasticsearch-6.4.0.jar:6.4.0]
at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:152) ~[elasticsearch-6.4.0.jar:6.4.0]
at org.elasticsearch.xpack.monitoring.action.TransportMonitoringBulkAction.doExecute(TransportMonitoringBulkAction.java:56) ~[?:?]
at org.elasticsearch.xpack.monitoring.action.TransportMonitoringBulkAction.doExecute(TransportMonitoringBulkAction.java:36) ~[?:?]
at org.elasticsearch.action.support.TransportAction.doExecute(TransportAction.java:143) ~[elasticsearch-6.4.0.jar:6.4.0]
at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:167) ~[elasticsearch-6.4.0.jar:6.4.0]
at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.apply(SecurityActionFilter.java:128) ~[?:?]

As a workaround:

After deleting all the indexes cluster health is in Green for the first 4-5 days.
The cluster turned to Yellow because of replicas not being allocated to Node 2

Please try to help as we are facing this issue for very long time.


(Peter Dyson) #2

What is your minimum masters setting?
Is it the same on all nodes?

The logs indicate that Node 2 cannot find a master node, do you have any networking issues between Node 2 and the other nodes or generally?

Do you have any shard allocation awareness in play?
https://www.elastic.co/guide/en/elasticsearch/reference/current/allocation-awareness.html

When you see the cluster go yellow can you provide an Allocation Explain output?
GET /_cluster/allocation/explain

What replica settings do you have on the indices that go yellow when losing 1 of 3 data nodes in the cluster?