Elasticsearch 3 node cluster failing if master is down second time


(Sam) #1

I have a 3 node cluster and wanted to know the best possible configuration of elastic search. Below is the configuration I have tried.

Server 1,2,3

node.name : node1 [ node2, node3 ]
node.master: true
node.data: true

it works fine, however while testing It is failing in one scenario.Below are the tests i have performed.

Test 1

  • Bring All 3 nodes up and shutdown master node in this case node1 is mater , Cluster state goes to yellow and back to Green after a while with 2 modes running.

  • Now bring back the node that was shutdown , now all 3 nodes were up
    and now node2 is elected as master and everything is working fine

Test2

  • After Test 1 is run and all 3 nodes were up since node2 is
    elected as master, I have shutdown node2 , now i am not able to connect
    and get failed to connect error below

curl: (7) Failed to connect to localhost port 9200: Connection refused

  • However node1 which is elected as master show's below message with Cluster health status changed from [YELLOW] to [GREEN]

Message


[2017-05-26T22:52:23,996][INFO ][o.e.c.r.a.AllocationService] [node-1]
Cluster health status changed from [GREEN] to [YELLOW] (reason: [{node-2}{6THtjHP4SuKsrAZYcOv6Sw}{wyC6CjSESOOisEMTOE_wCw}{127.0.0.1}{127.0.0.1:9301} transport disconnected, {node-2}{6THtjHP4SuKsrAZYcOv6Sw}{wyC6CjSESOOisEMTOE_wCw}{127.0.0.1}{127.0.0.1:9301} transport disconnected]).
[2017-05-26T22:52:23,997][INFO ][o.e.c.s.ClusterService   ] [node-1] removed  {{node-2}{6THtjHP4SuKsrAZYcOv6Sw}{wyC6CjSESOOisEMTOE_wCw}{127.0.0.1}{127.0.0.1:9301},}, reason: zen-disco-node-failed({node-2}{6THtjHP4SuKsrAZYcOv6Sw} {wyC6CjSESOOisEMTOE_wCw}{127.0.0.1}{127.0.0.1:9301}), reason(transport  disconnected)[{node-2}{6THtjHP4SuKsrAZYcOv6Sw}{wyC6CjSESOOisEMTOE_wCw}{127.0.0.1} {127.0.0.1:9301} transport disconnected, {node-2}{6THtjHP4SuKsrAZYcOv6Sw}{wyC6CjSESOOisEMTOE_wCw}{127.0.0.1}{127.0.0.1:9301} transport disconnected]
[2017-05-26T22:52:24,099][INFO ][o.e.c.r.DelayedAllocationService] [node-1] scheduling reroute for delayed shards in [59.7s] (3 delayed shards)
[2017-05-26T22:53:26,115][INFO ][o.e.c.r.a.AllocationService] [node-1] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[shakespeare][3], [shakespeare][4]] ...]).

Test3


After Test 1 is run and all 3 nodes were up since  **node2**  is elected as 
master, I have shutdown  **node3** , now everything is working fine
since I did not shutdown master the second time


(Mark Walkom) #2

This is expected behaviour given your configuration.

However is there something you'd like to ask here, as it's not clear :slight_smile:


(Sam) #3

thanks for response warkolm, So there is no solution ? I am randomly bring down the master node , first time when i bring down it works fine, however when i bring down the second time, it fails.
all my 3 nodes are master eligible
node.master : true
node.data : true

not sure why cluster does not respond the second time , let me know what is the best configuration for 3 node cluster


(Mark Walkom) #4

Are you talking about when you curl the node you stopped? That's entirely expected.


(Sam) #5

when i run curl command for cluster health , it is not working.
curl -XGET 'localhost:9200/_cluster/health?pretty'
curl: (7) Failed to connect to localhost port 9200: Connection refused

My elastic head , does not show any nodes , when i shut down the master second time only.


(Mark Walkom) #6

If you are on that node and then stop ES then of course it won't respond. You need to contact a different node.


(Sam) #7

Thanks, currently i am testing 3 nodes cluster on my mac ( single machine) . maybe that is the reason.
anyhow , sometimes i am able to connect to the cluster sometimes not. will debug more.


(Mark Walkom) #8

Right, well that makes more sense!

See how that is not 9300? When you start another node on the same host, ES picks 9300+1, if you start another it's 9301+2 etc etc. It's the same for the HTTP port on 9200. First node gets 9200, second 9201, third 9203.

So if you stop the first node you need to curl localhost:9201.


(Sam) #9

Oh! Thanks a lot, it worked like you mentioned , Appreciate it a lot.


(system) #10

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.