Cluster Yellow with master not discovered error and other errors after 7.4 upgrade

angelESZig · December 18, 2019, 5:46pm

Elasticsearch version ( bin/elasticsearch --version ): 7.4

Plugins installed : none

JVM version ( java -version ):11.0.5

OS version ( uname -a if on a Unix-like system): aws ec2 linux

Description of the problem including expected versus actual behavior : Our cluster keeps going into yellow state mentioning that master not discovered this started happening after we moved to 7.4. We have tried everything including increasing capacity we have around 180 data nodes and 3 master nodes.

Provide logs (if relevant) :

we are seeing these logs when the cluster goes into yellow:

[2019-12-17T18:05:23,358][WARN ][o.e.a.s.TransportClearScrollAction] [query-0-17x.xx.xx.x]Clear SC failed on node[{data-0-172.30.201.95}{IWIzDcbLSIu0JSBBrdM9lw}{WGxWMOG2S86b2ltx2ANgeQ}{}{host=17x.xx.xx rack_id=us-east-1a, ml.machine_memory=64385785856, ml.max_open_jobs=20, xpack.installed=true}]
org.elasticsearch.transport.RemoteTransportException: [data-0-172.xx.xx][[indices:data/read/search[free_context/scroll]]
Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [31961865140/29.7gb], which is larger than the limit of [30386474188/28.2gb], real usage: [31961864696/29.7gb], new bytes reserved: [444/444b], usages [request=0/0b, fielddata=21490892/20.4mb, in_flight_requests=444/444b, accounting=4658750179/4.3gb]
at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:343) ~[elasticsearch-7.4.1.jar:7.4.1]

2019-12-17T18:07:54,493][WARN ][o.e.c.NodeConnectionsService] [master-0-xx.xx.xx]failed to connect to {data-0-1xx.xx.xx}{lfhAg0FlRae3DVGeavemyQ}{TRxRnfcZTIaxGvXS6NLxFg}}{host=17x.xx.xx rack_id=us-east-1b, ml.machine_memory=73758015488, ml.max_open_jobs=20, xpack.installed=true} (tried [1] times)
org.elasticsearch.transport.ConnectTransportException: [data-0-172..xxxx] connect_exception
at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:976) ~[elasticsearch-7.4.1.jar:7.4.1]
at org.elasticsearch.action.ActionListener.lambda$toBiConsumer$3(ActionListener.java:161) ~[elasticsearch-7.4.1.jar:7.4.1]
at org.elasticsearch.common.concurrent.CompletableContext.lambda$addListener$0(CompletableContext.java:42) ~[elasticsearch-core-7.4.1.jar:7.4.1]

[2019-12-17T18:05:24,237][WARN ][o.e.c.r.a.AllocationService] [master-0-172.xx.xx.xx]failing shard [failed shard, shard [complete-tagged-2019-12-v2][131], node[IWIzDcbLSIu0JSBBrdM9lw], [R], s[STARTED], a[id=XhQMwYoCTPuqZ5MlXSHRdg], message [failed to perform indices:data/write/bulk[s] on replica [complete-tagged-2019-12-v2][131], node[IWIzDcbLSIu0JSBBrdM9lw], [R], s[STARTED], a[id=XhQMwYoCTPuqZ5MlXSHRdg]], failure [RemoteTransportException[[data-0-172.30.201.95][172.30.201.95:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [30519071200/28.4gb], which is larger than the limit of [30386474188/28.2gb], real usage: [30519024120/28.4gb], new bytes reserved: [47080/45.9kb], usages [request=0/0b, fielddata=21490892/20.4mb, in_flight_requests=47520/46.4kb, accounting=4658686279/4.3gb]]; ], markAsStale [true]]

| Dec 17 12:43:44.236 | i-0a8b8e974fc8db3b6 | elasticsearch | | [2019-12-17T17:43:44,236][WARN ][o.e.c.c.ClusterFormationFailureHelper] [query-0-1xx.xx.xx]master not discovered yet: have discovered [{query-0-172.30.201.187}{y_8CAEETRHmGnl-Vptim-A}{dgnRXCGkRbiTNsWVQyGpaA}{17xx.xx.xx}{1xx.xx.xx:xxx}{il}{rack_id=us-east-1a, ml.machine_memory=133658669056, xpack.installed=true, host=17xx.xx.xx, ml.max_open_jobs=20}, {master-0-1xx.xx.xx}{UJpnhA0oQLWyxeT2B8NiDA}{oBTXtCl0RNuKGIHAFhPMcA}

system · January 15, 2020, 5:59pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cluster state often in yellow: data node and master are running but failed to ping each other Elasticsearch	3	2206	May 11, 2018
Elasticsearch cluster of 4 nodes has "master not discovered exception" Elasticsearch	18	28529	May 18, 2018
Upgrading from ES 6.8 to 7.17 and got MasterNotDiscoveredException Elasticsearch docker	1	378	November 4, 2022
We have cluster of 4 nodes, where 2 nodes are master and data and other 2 nodes are data nodes, the configuration was working fine since 2 yrs, today we have to restart the cluster and since then we are getting master not discovered exception Elasticsearch elastic-stack-monitoring	26	765	September 8, 2023
Elasticsearch after upgrade on 7.7.0 starts CircuitBreakingExceptions and nodes leaves/rejoin cluster Elasticsearch	9	719	September 20, 2021

Cluster Yellow with master not discovered error and other errors after 7.4 upgrade

Related Topics