We have cluster of 4 nodes, where 2 nodes are master and data and other 2 nodes are data nodes, the configuration was working fine since 2 yrs, today we have to restart the cluster and since then we are getting master not discovered exception

@DavidTurner If you see this logs, these are the logs which I got without any change in the config.yml, from this logs can you tell where is the storage performance might be affecting the cluster?

[2023-08-05T22:02:07,792][WARN ][r.suppressed             ] [ES-Master-1] path: /_license, params: {human=false}
org.elasticsearch.discovery.MasterNotDiscoveredException: null
        at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.onTimeout(TransportMasterNodeAction.java:297) [elasticsearch-7.16.2.jar:7.16.2]
        at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:345) [elasticsearch-7.16.2.jar:7.16.2]
        at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:263) [elasticsearch-7.16.2.jar:7.16.2]
        at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:660) [elasticsearch-7.16.2.jar:7.16.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:718) [elasticsearch-7.16.2.jar:7.16.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
        at java.lang.Thread.run(Thread.java:833) [?:?]
[2023-08-05T22:02:58,576][WARN ][r.suppressed             ] [ES-Master-2] path: /_license, params: {human=false}
org.elasticsearch.discovery.MasterNotDiscoveredException: null
        at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.onTimeout(TransportMasterNodeAction.java:297) [elasticsearch-7.16.2.jar:7.16.2]
        at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:345) [elasticsearch-7.16.2.jar:7.16.2]
        at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:263) [elasticsearch-7.16.2.jar:7.16.2]
        at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:660) [elasticsearch-7.16.2.jar:7.16.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:718) [elasticsearch-7.16.2.jar:7.16.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
        at java.lang.Thread.run(Thread.java:833) [?:?]

These are the logs that are getting printed by my old config, I don't see any storage performance issue. Sorry for asking this much questions, as I am new to elasticsearch

Those logs are not informative. The relevant log message is the one I highlighted above:

@DavidTurner currently we are using bursting throughput for EFS, can you recommend which throughput we should use so that it can resolve the issue?

I can't help with your EFS config, sorry. My best recommendation aligns with the docs I shared above: move off EFS.

you could change the storage or designe anotherone more fast chage your configuration to that path and remember to chanage the path in bought configuration files restar and try again of each nodes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.