Multicast discovery sometimes fails


(Urs) #1

Hello,

I have two nodes, 0.90.3, CentOS 6.4 on vSphere 5.1U1. Both machines have
two networks. An access VLAN (172.31.120.0/24) and a separate VLAN for all
replication/internal ES traffic (10.105.1.0/24)

At the beginning it just worked fine. But then, sometimes, if i restart one
node, it doesn't discover the other node anymore. I already have set
"network.publish_host: 10.105.1.x" on both nodes from the beginning,
because i wanted to have all this traffic on the separated VLAN. I can
restart the service or machine as many times i want, it doesn't work
anymore. After some time it can happen that it starts working again when
restarting ES. Haven't found a way to reproduce this.

Then I set "discovery.zen.ping.unicast.hosts: ["10.105.1.x"]" on the node
which wasn't able to rejoin the cluster and than it worked again. Changing
it back, doesn't work...

Very strange.
I thought ES uses the network defined on publish_host only for discovery if
it is defined. Am i wrong with that?
Would like to use multicast instead of unicast if possible, but i have to
be sure it works correctly.

Thank you
Urs

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Alexander Reelsen) #2

Hey,

can you increase the log level and check the logfiles maybe, if that gives
you more information? Would be interested to know more in your case...

--Alex

On Wed, Oct 2, 2013 at 12:18 PM, Urs uweiss@icrcom.com wrote:

Hello,

I have two nodes, 0.90.3, CentOS 6.4 on vSphere 5.1U1. Both machines have
two networks. An access VLAN (172.31.120.0/24) and a separate VLAN for
all replication/internal ES traffic (10.105.1.0/24)

At the beginning it just worked fine. But then, sometimes, if i restart
one node, it doesn't discover the other node anymore. I already have set
"network.publish_host: 10.105.1.x" on both nodes from the beginning,
because i wanted to have all this traffic on the separated VLAN. I can
restart the service or machine as many times i want, it doesn't work
anymore. After some time it can happen that it starts working again when
restarting ES. Haven't found a way to reproduce this.

Then I set "discovery.zen.ping.unicast.hosts: ["10.105.1.x"]" on the node
which wasn't able to rejoin the cluster and than it worked again. Changing
it back, doesn't work...

Very strange.
I thought ES uses the network defined on publish_host only for discovery
if it is defined. Am i wrong with that?
Would like to use multicast instead of unicast if possible, but i have to
be sure it works correctly.

Thank you
Urs

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #3