Problem with Discovery / forming mini-clusters

We are having some problems with discovery when restarting nodes. At times when nodes were restarted most would join into one large cluster, but then a few would form their own mini-clusters of one or two nodes. Whenever this happens we get eight or so of Zen ping warnings. If we get two or fewer of these warnings the node usually joins the cluster correctly.

Example:

[2011-02-25 09:06:35,377][WARN ][discovery.zen.ping.multicast] [Piaget, Jean] received ping response with no matching id [1]

The mini-clusters get "State not recoverable" and "disable_persistence_state: true" under blocks.

If somebody can point us to some common Zen discovery "gotchas" it would be much appreciated.

Are you using multicast discovery or unicast discovery? One option is to increase the time a node waits for responses from other nodes during the discovery process. This is controlled using the discovery.zen.initial_ping_timeout (defaults to 3s).

-shay.banon
On Friday, February 25, 2011 at 6:17 PM, Sorostaran wrote:

We are having some problems with discovery when restarting nodes. At times
when nodes were restarted most would join into one large cluster, but then a
few would form their own mini-clusters of one or two nodes. Whenever this
happens we get eight or so of Zen ping warnings. If we get two or fewer of
these warnings the node usually joins the cluster correctly.

Example:

[2011-02-25 09:06:35,377][WARN ][discovery.zen.ping.multicast] [Piaget,
Jean] received ping response with no matching id [1]

The mini-clusters get "State not recoverable" and
"disable_persistence_state: true" under blocks.

If somebody can point us to some common Zen discovery "gotchas" it would be
much appreciated.

View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Problem-with-Discovery-forming-mini-clusters-tp2576266p2576266.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

We were using multicast. After fiddling with the initial ping timeout, the cluster now forms reliably at a setting of 30 seconds. Thank you, Shay.

Cool. In this case, I would suggest you use unicast discovery, since multicast messages should not take this long...
On Monday, February 28, 2011 at 7:34 PM, Sorostaran wrote:

We were using multicast. After fiddling with the initial ping timeout, the
cluster now forms reliably at a setting of 30 seconds. Thank you, Shay.

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Problem-with-Discovery-forming-mini-clusters-tp2576266p2596646.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.