So I've been working with an elasticsearch cluster for a couple months now.
I'm finally getting it into production, but after a soft launch I realized
that I needed to allocate more ram to each instance. I'm running 3 boxes
with 3 instances of elasticsearch each. I took the first box down, added
the ram, and brought it back up into the cluster. All is well. Moving to
the second, I now have a routing/load balancing node that won't come back
into the cluster. The other 2 instances joined fine. I tried several times
to reboot the failed instance with no luck. I keep getting a "no masterNode
returned" error.
Releveant Info
3 Machines, 3 Instances each
Instance 1 + 2: Data + eligible master
Instance 3: No data, no master (load balancing/routing only)
OS: Centos 6.4
Elasticsearch version 0.90.3 ( I know this is somewhat dated now, we must
extensively test new releases in dev/test before moving to production)
IPTables:
Elasticsearch Rest API (HTTP)
-A INPUT -m state --state NEW -m tcp -p tcp --dport 9200 -j ACCEPT
#Elasticserach Transport Service
-A INPUT -m state --state NEW -m tcp -p tcp --dport 9300 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 9301 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 9302 -j ACCEPT
Allow Mulitcast for ElasticSearch auto-discovery
-A INPUT -m pkttype --pkt-type multicast -j ACCEPT
Trace discovery logs:
https://gist.github.com/jumpinjoeadams/7008972
Relevant ES config:
cluster.name: NightRunnerProd
http.enabled: false ( on instances 1+2 only)
gateway.recover_after_nodes: 4
gateway.recover_after_time: 20s
gateway.expected_nodes: 6 ( These recovery options were lowered to
resolve this issue previously, but it just prolonged the issue apparently)
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["10.11.253.173[9300-9305]",
"10.11.253.174[9300-9305]", "10.11.253.175[9300-9305]"]
node.master: false ( instance 3 only)
node.data: false (instance 3 only)
I have tried disabling iptables
SELinux has no errors
Google provides no help.
Every time the node comes up, it doesn't join the cluster, just gives the
503. The failed node is the balancing node on machine 2.
I'm going nuts trying to figure out why this happens from time to time.
Thanks in advance!
Joe
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.