Memory usage of dedicated master-nodes are too high

We have 3 dedicated master-nodes, 3 data-nodes and 2 ingest-nodes.

version: 7.3
shards: 3
replica: 1

master nodes- 2vCPUs, 2 GM RAM (For all 3 nodes),
data nodes- 4vCPUs, 16 GB RAM (For all 3 nodes),
ingest nodes- 2vCPUs, 4 GB RAM (For all 2 nodes)

xml for dedicated master node (Tell me if it is not configured properly)

node.master: true
node.data: false
node.ingest: false

do I need to set anything else for dedicated master node?

Heap Memory for master nodes are 50% ie :-

-Xms1g
-Xmx1g

But I read in one article that you can increase it to 75-80% for dedicated master nodes:

https://discuss.elastic.co/t/master-node-gradually-moves-to-out-of-memory-exception/102344

Also in some other article I find out that setting more than 50% of heap memory is not a good practice:

https://discuss.elastic.co/t/elasticsearch-master-node-having-high-memory-pressure/205079

Memory Usage of all 3 master nodes are nearly 80-85%, How can i reduce this.

Thanks.

Which Elasticsearch version are you using? How many indices and shards do you have in the cluster? Do you have any non-default cluster settings?

Thanks for the reply.

version: 7.3
shards: 3
replica: 1

Do you only have a single index? Are you making sure the dedicated master nodes are not serving requests?

No we have 4 index, and about serving request I am not very sure about it but I did not follow this configuration:

node.master: true 
node.voting_only: false 
node.data: false 
node.ingest: false 
node.ml: false 
xpack.ml.enabled: true 
cluster.remote.connect: false

I came to know that this is the configuration of dedicated master node, But my xml file looks like this.

node.master: true
node.data: false
node.ingest: false

So can we say that my master nodes are serving request?

Any node can serve requests, so it will depend on whether you have configured you clients to connect to them or not.

No I am sure client nodes/Ingest nodes are only used for the connection, not the master nodes

Is there anything in the logs indicating frequent or long GC on the master node?

there is no new log, this is yesterday log I get :

[2020-04-20T18:24:25,913][WARN ][o.e.c.c.ClusterFormationFailureHelper] [master-node-3] master not discovered or elected yet, an election requires at least 2 nodes with ids from [sAvrgzVUTK2mM94CG_Q6Kg, GXALaMm0Sa6KIyUsmlqcgA, mByexEgYTZOFb7HhQ20oTw], have discovered [{master-node-3}{GXALaMm0Sa6KIyUsmlqcgA}{6ITwaE7bTgiYED5iC975qA}{10.66.0.48}{10.66.0.48:9300}{m}{ml.machine_memory=2085462016, xpack.installed=true, ml.max_open_jobs=20}] which is not a quorum; discovery will continue using [10.66.0.36:9300, 10.66.0.35:9300, 10.66.0.37:9300] from hosts providers and [{master-node-3}{GXALaMm0Sa6KIyUsmlqcgA}{6ITwaE7bTgiYED5iC975qA}{10.66.0.48}{10.66.0.48:9300}{m}{ml.machine_memory=2085462016, xpack.installed=true, ml.max_open_jobs=20}, {master-node-2}{sAvrgzVUTK2mM94CG_Q6Kg}{OB7_yzCTRtWKy4MljuqF5A}{10.66.0.46}{10.66.0.46:9300}{m}{ml.machine_memory=2085462016, ml.max_open_jobs=20, xpack.installed=true}] from last-known cluster state; node term 9, last-accepted version 394 in term 9
[2020-04-20T18:24:29,209][WARN ][o.e.c.NodeConnectionsService] [master-node-3] failed to connect to {master-node-2}{sAvrgzVUTK2mM94CG_Q6Kg}{OB7_yzCTRtWKy4MljuqF5A}{10.66.0.46}{10.66.0.46:9300}{m}{ml.machine_memory=2085462016, ml.max_open_jobs=20, xpack.installed=true} (tried [235] times)
org.elasticsearch.transport.ConnectTransportException: [master-node-2][10.66.0.46:9300] connect_exception
        at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:957) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.action.ActionListener.lambda$toBiConsumer$3(ActionListener.java:161) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.common.concurrent.CompletableContext.lambda$addListener$0(CompletableContext.java:42) ~[elasticsearch-core-7.3.2.jar:7.3.2]
        at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) ~[?:?]
        at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) ~[?:?]
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2159) ~[?:?]
        at org.elasticsearch.common.concurrent.CompletableContext.completeExceptionally(CompletableContext.java:57) ~[elasticsearch-core-7.3.2.jar:7.3.2]
        at org.elasticsearch.transport.netty4.Netty4TcpChannel.lambda$addListener$0(Netty4TcpChannel.java:68) ~[?:?]
        at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:502) ~[?:?]
        at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:495) ~[?:?]
        at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:474) ~[?:?]
        at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:415) ~[?:?]
        at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:540) ~[?:?]
        at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:533) ~[?:?]
        at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:114) ~[?:?]
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:327) ~[?:?]
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:343) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:670) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:582) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:536) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) ~[?:?]
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906) ~[?:?]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
        at java.lang.Thread.run(Thread.java:835) [?:?]
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: 10.66.0.46/10.66.0.46:9300
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:779) ~[?:?]
        at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327) ~[?:?]
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
        ... 7 more
Caused by: java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:779) ~[?:?]
        at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327) ~[?:?]
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
        ... 7 more
[2020-04-20T18:24:35,914][WARN ][o.e.c.c.ClusterFormationFailureHelper] [master-node-3] master not discovered or elected yet, an election requires at least 2 nodes with ids from [sAvrgzVUTK2mM94CG_Q6Kg, GXALaMm0Sa6KIyUsmlqcgA, mByexEgYTZOFb7HhQ20oTw], have discovered [{master-node-3}{GXALaMm0Sa6KIyUsmlqcgA}{6ITwaE7bTgiYED5iC975qA}{10.66.0.48}{10.66.0.48:9300}{m}{ml.machine_memory=2085462016, xpack.installed=true, ml.max_open_jobs=20}] which is not a quorum; discovery will continue using [10.66.0.36:9300, 10.66.0.35:9300, 10.66.0.37:9300] from hosts providers and [{master-node-3}{GXALaMm0Sa6KIyUsmlqcgA}{6ITwaE7bTgiYED5iC975qA}{10.66.0.48}{10.66.0.48:9300}{m}{ml.machine_memory=2085462016, xpack.installed=true, ml.max_open_jobs=20}, {master-node-2}{sAvrgzVUTK2mM94CG_Q6Kg}{OB7_yzCTRtWKy4MljuqF5A}{10.66.0.46}{10.66.0.46:9300}{m}{ml.machine_memory=2085462016, ml.max_open_jobs=20, xpack.installed=true}] from last-known cluster state; node term 9, last-accepted version 394 in term 9
[2020-04-20T18:24:38,912][INFO ][o.e.c.c.JoinHelper       ] [master-node-3] failed to join {master-node-3}{GXALaMm0Sa6KIyUsmlqcgA}{6ITwaE7bTgiYED5iC975qA}{10.66.0.48}{10.66.0.48:9300}{m}{ml.machine_memory=2085462016, xpack.installed=true, ml.max_open_jobs=20} with JoinRequest{sourceNode={master-node-3}{GXALaMm0Sa6KIyUsmlqcgA}{6ITwaE7bTgiYED5iC975qA}{10.66.0.48}{10.66.0.48:9300}{m}{ml.machine_memory=2085462016, xpack.installed=true, ml.max_open_jobs=20}, optionalJoin=Optional[Join{term=10, lastAcceptedTerm=9, lastAcceptedVersion=394, sourceNode={master-node-3}{GXALaMm0Sa6KIyUsmlqcgA}{6ITwaE7bTgiYED5iC975qA}{10.66.0.48}{10.66.0.48:9300}{m}{ml.machine_memory=2085462016, xpack.installed=true, ml.max_open_jobs=20}, targetNode={master-node-3}{GXALaMm0Sa6KIyUsmlqcgA}{6ITwaE7bTgiYED5iC975qA}{10.66.0.48}{10.66.0.48:9300}{m}{ml.machine_memory=2085462016, xpack.installed=true, ml.max_open_jobs=20}}]}
org.elasticsearch.transport.RemoteTransportException: [master-node-3][10.66.0.48:9300][internal:cluster/coordination/join]
Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: became follower
        at org.elasticsearch.cluster.coordination.JoinHelper$CandidateJoinAccumulator.lambda$close$3(JoinHelper.java:477) [elasticsearch-7.3.2.jar:7.3.2]
        at java.util.HashMap$Values.forEach(HashMap.java:976) [?:?]
        at org.elasticsearch.cluster.coordination.JoinHelper$CandidateJoinAccumulator.close(JoinHelper.java:477) [elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.cluster.coordination.Coordinator.becomeFollower(Coordinator.java:606) [elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.cluster.coordination.Coordinator.onFollowerCheckRequest(Coordinator.java:243) [elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.cluster.coordination.FollowersChecker$2.doRun(FollowersChecker.java:187) [elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:758) [elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.3.2.jar:7.3.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:835) [?:?]
[2020-04-20T18:24:39,175][INFO ][o.e.c.s.ClusterApplierService] [master-node-3] master node changed {previous [], current [{master-node-2}{sAvrgzVUTK2mM94CG_Q6Kg}{VgwJvlD_Tx6kred7ijM1UQ}{10.66.0.46}{10.66.0.46:9300}{m}{ml.machine_memory=2085462016, ml.max_open_jobs=20, xpack.installed=true}]}, removed {{data-node-3}{YZ1yUnEORAyN4nEcJNEOFA}{r9OZuTQmRWqYQOcU6MiwwQ}{10.66.0.39}{10.66.0.39:9300}{d}{ml.machine_memory=16816304128, ml.max_open_jobs=20, xpack.installed=true},{master-node-2}{sAvrgzVUTK2mM94CG_Q6Kg}{OB7_yzCTRtWKy4MljuqF5A}{10.66.0.46}{10.66.0.46:9300}{m}{ml.machine_memory=2085462016, ml.max_open_jobs=20, xpack.installed=true},{client-node-1}{QNq6hjIEQa6AP2i-V43i-Q}{ZZbHWZzmSM6lJHgKCmqB5g}{10.66.0.37}{10.66.0.37:9300}{i}{ml.machine_memory=4133089280, ml.max_open_jobs=20, xpack.installed=true},{data-node-1}{Qeiq9OCXQNa8D7-uksFrXg}{P59LeLoeT6eAmaN9TmmtvA}{10.66.0.35}{10.66.0.35:9300}{d}{ml.machine_memory=16816304128, ml.max_open_jobs=20, xpack.installed=true},{data-node-2}{5t-AfS3SSUiYcQfY9ISgOw}{zHcU6N7YStiZm08ssyFdEQ}{10.66.0.38}{10.66.0.38:9300}{d}{ml.machine_memory=16816304128, ml.max_open_jobs=20, xpack.installed=true},{client-node-2}{uX_Do6fLTfeg_ZIw3DrutA}{wHu0sWifSouV6GLgmvj8XQ}{10.66.0.49}{10.66.0.49:9300}{i}{ml.machine_memory=4133105664, ml.max_open_jobs=20, xpack.installed=true},}, added {{master-node-2}{sAvrgzVUTK2mM94CG_Q6Kg}{VgwJvlD_Tx6kred7ijM1UQ}{10.66.0.46}{10.66.0.46:9300}{m}{ml.machine_memory=2085462016, ml.max_open_jobs=20, xpack.installed=true},}, term: 11, version: 395, reason: ApplyCommitRequest{term=11, version=395, sourceNode={master-node-2}{sAvrgzVUTK2mM94CG_Q6Kg}{VgwJvlD_Tx6kred7ijM1UQ}{10.66.0.46}{10.66.0.46:9300}{m}{ml.machine_memory=2085462016, ml.max_open_jobs=20, xpack.installed=true}}

May be this will helpful for you:

Apr 20 18:24:39 sl-stage-masternode3 elasticsearch[959]: [2020-04-20T18:24:39,926][INFO ][o.e.c.s.ClusterApplierService] [master-node-3] added {{data-node-2}{5t-AfS3SSUiYcQfY9ISgOw}{zHcU6N7YStiZm08ssyFdEQ}{10.66.0.38}{10.66.0.38:9300}{d}{ml.machine_memory=16816304128, ml.max_open_jobs=20, xpack.installed=true},}, term: 11, version: 398, reason: ApplyCommitRequest{term=11, version=398, sourceNode={master-node-2}{sAvrgzVUTK2mM94CG_Q6Kg}{VgwJvlD_Tx6kred7ijM1UQ}{10.66.0.46}{10.66.0.46:9300}{m}{ml.machine_memory=2085462016, ml.max_open_jobs=20, xpack.installed=true}}
Apr 20 18:24:40 sl-stage-masternode3 elasticsearch[959]: [2020-04-20T18:24:40,686][INFO ][o.e.x.s.a.TokenService   ] [master-node-3] refresh keys
Apr 20 18:24:40 sl-stage-masternode3 elasticsearch[959]: [2020-04-20T18:24:40,849][INFO ][o.e.x.s.a.TokenService   ] [master-node-3] refreshed keys
Apr 20 18:37:11 sl-stage-masternode3 elasticsearch[959]: [2020-04-20T18:37:11,728][INFO ][o.e.c.c.Coordinator      ] [master-node-3] master node [{master-node-2}{sAvrgzVUTK2mM94CG_Q6Kg}{VgwJvlD_Tx6kred7ijM1UQ}{10.66.0.46}{10.66.0.46:9300}{m}{ml.machine_memory=2085462016, ml.max_open_jobs=20, xpack.installed=true}] failed, restarting discovery
Apr 20 18:37:11 sl-stage-masternode3 elasticsearch[959]: org.elasticsearch.ElasticsearchException: node [{master-node-2}{sAvrgzVUTK2mM94CG_Q6Kg}{VgwJvlD_Tx6kred7ijM1UQ}{10.66.0.46}{10.66.0.46:9300}{m}{ml.machine_memory=2085462016, ml.max_open_jobs=20, xpack.installed=true}] failed [3] consecutive checks
Apr 20 18:37:11 sl-stage-masternode3 elasticsearch[959]:         at org.elasticsearch.cluster.coordination.LeaderChecker$CheckScheduler$1.handleException(LeaderChecker.java:278) ~[elasticsearch-7.3.2.jar:7.3.2]
Apr 20 18:37:11 sl-stage-masternode3 elasticsearch[959]:         at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1111) ~[elasticsearch-7.3.2.jar:7.3.2]
Apr 20 18:37:11 sl-stage-masternode3 elasticsearch[959]:         at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:1012) ~[elasticsearch-7.3.2.jar:7.3.2]
Apr 20 18:37:11 sl-stage-masternode3 elasticsearch[959]:         at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:688) [elasticsearch-7.3.2.jar:7.3.2]
Apr 20 18:37:11 sl-stage-masternode3 elasticsearch[959]:         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
Apr 20 18:37:11 sl-stage-masternode3 elasticsearch[959]:         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
Apr 20 18:37:11 sl-stage-masternode3 elasticsearch[959]:         at java.lang.Thread.run(Thread.java:835) [?:?]
Apr 20 18:37:11 sl-stage-masternode3 elasticsearch[959]: Caused by: org.elasticsearch.transport.ReceiveTimeoutTransportException: [master-node-2][10.66.0.46:9300][internal:coordination/fault_detection/leader_check] request_id [1758236] timed out after [10006ms]
Apr 20 18:37:11 sl-stage-masternode3 elasticsearch[959]:         at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:1013) ~[elasticsearch-7.3.2.jar:7.3.2]
Apr 20 18:37:11 sl-stage-masternode3 elasticsearch[959]:         ... 4 more
Apr 20 18:37:11 sl-stage-masternode3 elasticsearch[959]: [2020-04-20T18:37:11,737][INFO ][o.e.c.s.ClusterApplierService] [master-node-3] master node changed {previous [{master-node-2}{sAvrgzVUTK2mM94CG_Q6Kg}{VgwJvlD_Tx6kred7ijM1UQ}{10.66.0.46}{10.66.0.46:9300}{m}{ml.machine_memory=2085462016, ml.max_open_jobs=20, xpack.installed=true}], current []}, term: 11, version: 414, reason: becoming candidate: onLeaderFailure
Apr 20 18:37:11 sl-stage-masternode3 elasticsearch[959]: [2020-04-20T18:37:11,841][INFO ][o.e.c.s.MasterService    ] [master-node-3] elected-as-master ([2] nodes joined)[{master-node-3}{GXALaMm0Sa6KIyUsmlqcgA}{6ITwaE7bTgiYED5iC975qA}{10.66.0.48}{10.66.0.48:9300}{m}{ml.machine_memory=2085462016, xpack.installed=true, ml.max_open_jobs=20} elect leader, {master-node-1}{mByexEgYTZOFb7HhQ20oTw}{U5Y66SeXR2u5UvKeRIFJIA}{10.66.0.36}{10.66.0.36:9300}{m}{ml.machine_memory=2085462016, ml.max_open_jobs=20, xpack.installed=true} elect leader, _BECOME_MASTER_TASK_, _FINISH_ELECTION_], term: 12, version: 415, reason: master node changed {previous [], current [{master-node-3}{GXALaMm0Sa6KIyUsmlqcgA}{6ITwaE7bTgiYED5iC975qA}{10.66.0.48}{10.66.0.48:9300}{m}{ml.machine_memory=2085462016, xpack.installed=true, ml.max_open_jobs=20}]}, added {{master-node-1}{mByexEgYTZOFb7HhQ20oTw}{U5Y66SeXR2u5UvKeRIFJIA}{10.66.0.36}{10.66.0.36:9300}{m}{ml.machine_memory=2085462016, ml.max_open_jobs=20, xpack.installed=true},}
Apr 20 18:37:41 sl-stage-masternode3 elasticsearch[959]: [2020-04-20T18:37:41,887][INFO ][o.e.c.s.ClusterApplierService] [master-node-3] master node changed {previous [], current [{master-node-3}{GXALaMm0Sa6KIyUsmlqcgA}{6ITwaE7bTgiYED5iC975qA}{10.66.0.48}{10.66.0.48:9300}{m}{ml.machine_memory=2085462016, xpack.installed=true, ml.max_open_jobs=20}]}, added {{master-node-1}{mByexEgYTZOFb7HhQ20oTw}{U5Y66SeXR2u5UvKeRIFJIA}{10.66.0.36}{10.66.0.36:9300}{m}{ml.machine_memory=2085462016, ml.max_open_jobs=20, xpack.installed=true},}, term: 12, version: 415, reason: Publication{term=12, version=415}
Apr 20 18:37:41 sl-stage-masternode3 elasticsearch[959]: [2020-04-20T18:37:41,971][WARN ][o.e.c.s.MasterService    ] [master-node-3] cluster state update task [elected-as-master ([2] nodes joined)[{master-node-3}{GXALaMm0Sa6KIyUsmlqcgA}{6ITwaE7bTgiYED5iC975qA}{10.66.0.48}{10.66.0.48:9300}{m}{ml.machine_memory=2085462016, xpack.installed=true, ml.max_open_jobs=20} elect leader, {master-node-1}{mByexEgYTZOFb7HhQ20oTw}{U5Y66SeXR2u5UvKeRIFJIA}{10.66.0.36}{10.66.0.36:9300}{m}{ml.machine_memory=2085462016, ml.max_open_jobs=20, xpack.installed=true} elect leader, _BECOME_MASTER_TASK_, _FINISH_ELECTION_]] took [30.1s] which is above the warn threshold of 30s
Apr 20 18:37:41 sl-stage-masternode3 elasticsearch[959]: [2020-04-20T18:37:41,974][INFO ][o.e.c.s.MasterService    ] [master-node-3] node-join[{client-node-1}{QNq6hjIEQa6AP2i-V43i-Q}{j1l32lenQ8-oWgkGsUuq0g}{10.66.0.37}{10.66.0.37:9300}{i}{ml.machine_memory=4133089280, ml.max_open_jobs=20, xpack.installed=true} join existing leader], term: 12, version: 416, reason: added {{client-node-1}{QNq6hjIEQa6AP2i-V43i-Q}{j1l32lenQ8-oWgkGsUuq0g}{10.66.0.37}{10.66.0.37:9300}{i}{ml.machine_memory=4133089280, ml.max_open_jobs=20, xpack.installed=true},}
Apr 20 18:38:11 sl-stage-masternode3 elasticsearch[959]: [2020-04-20T18:38:11,978][INFO ][o.e.c.s.ClusterApplierService] [master-node-3] added {{client-node-1}{QNq6hjIEQa6AP2i-V43i-Q}{j1l32lenQ8-oWgkGsUuq0g}{10.66.0.37}{10.66.0.37:9300}{i}{ml.machine_memory=4133089280, ml.max_open_jobs=20, xpack.installed=true},}, term: 12, version: 416, reason: Publication{term=12, version=416}
Apr 20 18:38:11 sl-stage-masternode3 elasticsearch[959]: [2020-04-20T18:38:11,985][WARN ][o.e.c.s.MasterService    ] [master-node-3] cluster state update task [node-join[{client-node-1}{QNq6hjIEQa6AP2i-V43i-Q}{j1l32lenQ8-oWgkGsUuq0g}{10.66.0.37}{10.66.0.37:9300}{i}{ml.machine_memory=4133089280, ml.max_open_jobs=20, xpack.installed=true} join existing leader]] took [30s] which is above the warn threshold of 30s
Apr 20 18:38:12 sl-stage-masternode3 elasticsearch[959]: [2020-04-20T18:38:12,006][INFO ][o.e.c.s.MasterService    ] [master-node-3] node-left[{master-node-2}{sAvrgzVUTK2mM94CG_Q6Kg}{VgwJvlD_Tx6kred7ijM1UQ}{10.66.0.46}{10.66.0.46:9300}{m}{ml.machine_memory=2085462016, ml.max_open_jobs=20, xpack.installed=true} followers check retry count exceeded], term: 12, version: 417, reason: removed {{master-node-2}{sAvrgzVUTK2mM94CG_Q6Kg}{VgwJvlD_Tx6kred7ijM1UQ}{10.66.0.46}{10.66.0.46:9300}{m}{ml.machine_memory=2085462016, ml.max_open_jobs=20, xpack.installed=true},}
Apr 20 18:38:12 sl-stage-masternode3 elasticsearch[959]: [2020-04-20T18:38:12,768][INFO ][o.e.c.s.ClusterApplierService] [master-node-3] removed {{master-node-2}{sAvrgzVUTK2mM94CG_Q6Kg}{VgwJvlD_Tx6kred7ijM1UQ}{10.66.0.46}{10.66.0.46:9300}{m}{ml.machine_memory=2085462016, ml.max_open_jobs=20, xpack.installed=true},}, term: 12, version: 417, reason: Publication{term=12, version=417}
Apr 20 18:54:33 sl-stage-masternode3 elasticsearch[959]: [2020-04-20T18:54:33,714][INFO ][o.e.c.s.MasterService    ] [master-node-3] node-join[{master-node-2}{sAvrgzVUTK2mM94CG_Q6Kg}{GT03Gdy8QPWF3Q82QPHG5A}{10.66.0.46}{10.66.0.46:9300}{m}{ml.machine_memory=2085462016, ml.max_open_jobs=20, xpack.installed=true} join existing leader], term: 12, version: 418, reason: added {{master-node-2}{sAvrgzVUTK2mM94CG_Q6Kg}{GT03Gdy8QPWF3Q82QPHG5A}{10.66.0.46}{10.66.0.46:9300}{m}{ml.machine_memory=2085462016, ml.max_open_jobs=20, xpack.installed=true},}
Apr 20 18:54:34 sl-stage-masternode3 elasticsearch[959]: [2020-04-20T18:54:34,641][INFO ][o.e.c.s.ClusterApplierService] [master-node-3] added {{master-node-2}{sAvrgzVUTK2mM94CG_Q6Kg}{GT03Gdy8QPWF3Q82QPHG5A}{10.66.0.46}{10.66.0.46:9300}{m}{ml.machine_memory=2085462016, ml.max_open_jobs=20, xpack.installed=true},}, term: 12, version: 418, reason: Publication{term=12, version=418}

I noticed there is GC log please check it once:

[2020-05-05T13:03:08.861+0000][23363][safepoint    ] Total time for which application threads were stopped: 0.0000972 seconds, Stopping threads took: 0.0000261 seconds
[2020-05-05T13:03:09.861+0000][23363][safepoint    ] Application time: 1.0001103 seconds
[2020-05-05T13:03:09.861+0000][23363][safepoint    ] Entering safepoint region: Cleanup
[2020-05-05T13:03:09.861+0000][23363][safepoint    ] Leaving safepoint region
[2020-05-05T13:03:09.861+0000][23363][safepoint    ] Total time for which application threads were stopped: 0.0000998 seconds, Stopping threads took: 0.0000268 seconds
[2020-05-05T13:03:10.862+0000][23363][safepoint    ] Application time: 1.0001082 seconds
[2020-05-05T13:03:10.862+0000][23363][safepoint    ] Entering safepoint region: Cleanup
[2020-05-05T13:03:10.863+0000][23363][safepoint    ] Leaving safepoint region
[2020-05-05T13:03:10.863+0000][23363][safepoint    ] Total time for which application threads were stopped: 0.0012213 seconds, Stopping threads took: 0.0010631 seconds
[2020-05-05T13:03:11.863+0000][23363][safepoint    ] Application time: 1.0001340 seconds
[2020-05-05T13:03:11.863+0000][23363][safepoint    ] Entering safepoint region: Cleanup
[2020-05-05T13:03:11.863+0000][23363][safepoint    ] Leaving safepoint region
[2020-05-05T13:03:11.863+0000][23363][safepoint    ] Total time for which application threads were stopped: 0.0001018 seconds, Stopping threads took: 0.0000280 seconds
[2020-05-05T13:03:12.863+0000][23363][safepoint    ] Application time: 1.0001070 seconds
[2020-05-05T13:03:12.863+0000][23363][safepoint    ] Entering safepoint region: Cleanup
[2020-05-05T13:03:12.863+0000][23363][safepoint    ] Leaving safepoint region
[2020-05-05T13:03:12.863+0000][23363][safepoint    ] Total time for which application threads were stopped: 0.0000998 seconds, Stopping threads took: 0.0000288 seconds
[2020-05-05T13:03:13.864+0000][23363][safepoint    ] Application time: 1.0001126 seconds
[2020-05-05T13:03:13.864+0000][23363][safepoint    ] Entering safepoint region: Cleanup
[2020-05-05T13:03:13.864+0000][23363][safepoint    ] Leaving safepoint region
[2020-05-05T13:03:13.864+0000][23363][safepoint    ] Total time for which application threads were stopped: 0.0000971 seconds, Stopping threads took: 0.0000262 seconds
[2020-05-05T13:03:14.864+0000][23363][safepoint    ] Application time: 1.0001104 seconds
[2020-05-05T13:03:14.864+0000][23363][safepoint    ] Entering safepoint region: Cleanup
[2020-05-05T13:03:14.864+0000][23363][safepoint    ] Leaving safepoint region
[2020-05-05T13:03:14.864+0000][23363][safepoint    ] Total time for which application threads were stopped: 0.0001755 seconds, Stopping threads took: 0.0000338 seconds
[2020-05-05T13:03:15.864+0000][23363][safepoint    ] Application time: 1.0001178 seconds
[2020-05-05T13:03:15.864+0000][23363][safepoint    ] Entering safepoint region: Cleanup
[2020-05-05T13:03:15.865+0000][23363][safepoint    ] Leaving safepoint region
[2020-05-05T13:03:15.866+0000][23363][safepoint    ] Total time for which application threads were stopped: 0.0012783 seconds, Stopping threads took: 0.0011306 seconds
[2020-05-05T13:03:16.866+0000][23363][safepoint    ] Application time: 1.0001562 seconds
[2020-05-05T13:03:16.866+0000][23363][safepoint    ] Entering safepoint region: Cleanup
[2020-05-05T13:03:16.866+0000][23363][safepoint    ] Leaving safepoint region
[2020-05-05T13:03:16.866+0000][23363][safepoint    ] Total time for which application threads were stopped: 0.0001008 seconds, Stopping threads took: 0.0000285 seconds
[2020-05-05T13:03:17.866+0000][23363][safepoint    ] Application time: 1.0001337 seconds
[2020-05-05T13:03:17.866+0000][23363][safepoint    ] Entering safepoint region: Cleanup
[2020-05-05T13:03:17.866+0000][23363][safepoint    ] Leaving safepoint region
[2020-05-05T13:03:17.866+0000][23363][safepoint    ] Total time for which application threads were stopped: 0.0001433 seconds, Stopping threads took: 0.0000260 seconds
[2020-05-05T13:03:18.866+0000][23363][safepoint    ] Application time: 1.0001432 seconds
[2020-05-05T13:03:18.866+0000][23363][safepoint    ] Entering safepoint region: Cleanup
[2020-05-05T13:03:18.867+0000][23363][safepoint    ] Leaving safepoint region
[2020-05-05T13:03:18.867+0000][23363][safepoint    ] Total time for which application threads were stopped: 0.0000982 seconds, Stopping threads took: 0.0000262 seconds
[2020-05-05T13:03:19.867+0000][23363][safepoint    ] Application time: 1.0001417 seconds
[2020-05-05T13:03:19.867+0000][23363][safepoint    ] Entering safepoint region: Cleanup
[2020-05-05T13:03:19.867+0000][23363][safepoint    ] Leaving safepoint region
[2020-05-05T13:03:19.867+0000][23363][safepoint    ] Total time for which application threads were stopped: 0.0001023 seconds, Stopping threads took: 0.0000287 seconds
[2020-05-05T13:03:20.867+0000][23363][safepoint    ] Application time: 1.0001315 seconds
[2020-05-05T13:03:20.867+0000][23363][safepoint    ] Entering safepoint region: Cleanup
[2020-05-05T13:03:20.867+0000][23363][safepoint    ] Leaving safepoint region
[2020-05-05T13:03:20.867+0000][23363][safepoint    ] Total time for which application threads were stopped: 0.0001397 seconds, Stopping threads took: 0.0000473 seconds
[2020-05-05T13:03:21.867+0000][23363][safepoint    ] Application time: 1.0001522 seconds
[2020-05-05T13:03:21.867+0000][23363][safepoint    ] Entering safepoint region: Cleanup
[2020-05-05T13:03:21.867+0000][23363][safepoint    ] Leaving safepoint region
[2020-05-05T13:03:21.868+0000][23363][safepoint    ] Total time for which application threads were stopped: 0.0001067 seconds, Stopping threads took: 0.0000296 seconds
[2020-05-05T13:03:22.868+0000][23363][safepoint    ] Application time: 1.0001307 seconds
[2020-05-05T13:03:22.868+0000][23363][safepoint    ] Entering safepoint region: Cleanup
[2020-05-05T13:03:22.868+0000][23363][safepoint    ] Leaving safepoint region
[2020-05-05T13:03:22.868+0000][23363][safepoint    ] Total time for which application threads were stopped: 0.0001250 seconds, Stopping threads took: 0.0000271 seconds
[2020-05-05T13:03:23.868+0000][23363][safepoint    ] Application time: 1.0001431 seconds
[2020-05-05T13:03:23.868+0000][23363][safepoint    ] Entering safepoint region: Cleanup
[2020-05-05T13:03:23.868+0000][23363][safepoint    ] Leaving safepoint region
[2020-05-05T13:03:23.868+0000][23363][safepoint    ] Total time for which application threads were stopped: 0.0001057 seconds, Stopping threads took: 0.0000308 seconds

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.