Hi All,
I have two nodes in my cluster, after a restart from both nodes, they can't discovery each other.
Firstly, I thought it was the 'elasticsearch-head' plugin issue. However, when I using java code to call the cluster, i got the same result.
Java code:
for (int i = 0; i< 100; i++) { Builder builder = ImmutableSettings.settingsBuilder(); builder.put("client.transport.sniff", true); Settings s = builder.build(); TransportClient tmp = new TransportClient(s); Client client = tmp .addTransportAddress(new InetSocketTransportAddress("192.168.1.115", 9300)) .addTransportAddress(new InetSocketTransportAddress("192.168.1.116", 9300)) ; NodesInfoResponse rsp = client.admin().cluster() .nodesInfo(new NodesInfoRequest()).actionGet(); String str = "Cluster:" + rsp.getClusterName() + ". Active nodes:"; str += rsp.getNodesMap().keySet(); System.out.println(str); Thread.sleep(1000); }
Call return:
Scene 1:(Return with the node info output alternatively, but rarely happens
Cluster:bseproductioncluster. Active nodes:[VJurJ8rNSnCsgMu-fQPLrQ]
Cluster:bseproductioncluster. Active nodes:[m__sMnEgRFGuHCDb-H6iiQ]
Cluster:bseproductioncluster. Active nodes:[VJurJ8rNSnCsgMu-fQPLrQ]
Cluster:bseproductioncluster. Active nodes:[VJurJ8rNSnCsgMu-fQPLrQ]
....
Scene 2:(Return with the only one output)
Cluster:bseproductioncluster. Active nodes:[VJurJ8rNSnCsgMu-fQPLrQ]
Cluster:bseproductioncluster. Active nodes:[VJurJ8rNSnCsgMu-fQPLrQ]
Cluster:bseproductioncluster. Active nodes:[VJurJ8rNSnCsgMu-fQPLrQ]
....
OR
Cluster:bseproductioncluster. Active nodes:[m__sMnEgRFGuHCDb-H6iiQ]
Cluster:bseproductioncluster. Active nodes:[m__sMnEgRFGuHCDb-H6iiQ]
Cluster:bseproductioncluster. Active nodes:[m__sMnEgRFGuHCDb-H6iiQ]
....
OR throws
Exception in thread "main" org.elasticsearch.transport.NodeDisconnectedException: [115-es-11][inet[/192.168.1.115:9300]][cluster/nodes/info] disconnected
OR throws the below exception while commented either line '.addTransportAddress(new InetSocketTransportAddress("192.168.1.115", 9300)) or .addTransportAddress(new InetSocketTransportAddress("192.168.1.116", 9300))
Exception in thread "main" org.elasticsearch.common.netty.channel.ChannelException: Failed to create a selector.
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.openSelector(AbstractNioWorker.java:198)
The expected result should be something like:
Cluster:bseproductioncluster. Active nodes:[VJurJ8rNSnCsgMu-fQPLrQ, m__sMnEgRFGuHCDb-H6iiQ]
Right?
Status in head plugin:
<nabble_img src="115.jpg" border="0" alt="115 node"/><nabble_img src="116.jpg" border="0" alt="116 node"/>
This kind of cluster status is not normal, right? However, no matter how many times I restart the nodes, they cant get back to normal..
And what's weird is that, while I start a windows OS ES node, and then start the 115 and 116 Linux Nodes,
three nodes can join in the cluster successfully.
<nabble_img src="226.jpg" border="0" alt="226 node"/>
Rerun the java code, result as below:
Cluster:bseproductioncluster. Active nodes:[nxJ9D5g6SGSdTNyPImXgJA, 7QgigKJwSFmPkrC0unDqfg, XnAQbXWCT-uP6bd4AfKm3Q]
.....
Cluster:bseproductioncluster. Active nodes:[nxJ9D5g6SGSdTNyPImXgJA, 7QgigKJwSFmPkrC0unDqfg, XnAQbXWCT-uP6bd4AfKm3Q]
Can anyone explain how/why does this happen?
My ES version 0.19.11
ES configration changelist:
discovery.zen.ping.timeout: 5s
discovery.zen.minimum_master_nodes: 1
Thanks,
Spancer