Master election issue?


(jacque74) #1

After full cluster restart, my master is unable to elect itself, here
are some logs:

[2012-02-24 10:55:21,549][INFO ][discovery.zen ] [img700]
failed to send join request to master [[img700][UO6rSvnqSl-eyoygRj-99g]
[inet[/208.94.2.146:9300]]{master=true}], reason
[org.elasticsearch.transport.RemoteTransportException: [img700][inet[/
208.94.2.146:9300]][discovery/zen/join];
org.elasticsearch.ElasticSearchIllegalStateException: Node [[img700]
[rk3NK_UyTpusT4FnXt3_MQ][inet[/208.94.2.146:9300]]{master=true}] not
master for join request from [[img700][rk3NK_UyTpusT4FnXt3_MQ][inet[/
208.94.2.146:9300]]{master=true}]]
[2012-02-24 10:55:24,561][INFO ][discovery.zen ] [img700]
failed to send join request to master [[img700][UO6rSvnqSl-eyoygRj-99g]
[inet[/208.94.2.146:9300]]{master=true}], reason
[org.elasticsearch.transport.RemoteTransportException: [img700][inet[/
208.94.2.146:9300]][discovery/zen/join];
org.elasticsearch.ElasticSearchIllegalStateException: Node [[img700]
[rk3NK_UyTpusT4FnXt3_MQ][inet[/208.94.2.146:9300]]{master=true}] not
master for join request from [[img700][rk3NK_UyTpusT4FnXt3_MQ][inet[/
208.94.2.146:9300]]{master=true}]]
[2012-02-24 10:55:27,568][INFO ][discovery.zen ] [img700]
failed to send join request to master [[img700][UO6rSvnqSl-eyoygRj-99g]
[inet[/208.94.2.146:9300]]{master=true}], reason
[org.elasticsearch.transport.RemoteTransportException: [img700][inet[/
208.94.2.146:9300]][discovery/zen/join];
org.elasticsearch.ElasticSearchIllegalStateException: Node [[img700]
[rk3NK_UyTpusT4FnXt3_MQ][inet[/208.94.2.146:9300]]{master=true}] not
master for join request from [[img700][rk3NK_UyTpusT4FnXt3_MQ][inet[/
208.94.2.146:9300]]{master=true}]]
[2012-02-24 10:55:30,778][INFO ][discovery.zen ] [img700]
failed to send join request to master [[img700][UO6rSvnqSl-eyoygRj-99g]
[inet[/208.94.2.146:9300]]{master=true}], reason
[org.elasticsearch.transport.RemoteTransportException: [img700][inet[/
208.94.2.146:9300]][discovery/zen/join];
org.elasticsearch.ElasticSearchIllegalStateException: Node [[img700]
[rk3NK_UyTpusT4FnXt3_MQ][inet[/208.94.2.146:9300]]{master=true}] not
master for join request from [[img700][rk3NK_UyTpusT4FnXt3_MQ][inet[/
208.94.2.146:9300]]{master=true}]]

This has been going on for 10 minutes or so... Any ideas what might
be wrong?

-Jack


(jacque74) #2

I was able to resolve this by shutting down all nodes in the cluster,
and starting up masters first.

-Jack

On Feb 24, 10:57 am, Jack Levin magn...@gmail.com wrote:

After full cluster restart, my master is unable to elect itself, here
are some logs:

[2012-02-24 10:55:21,549][INFO ][discovery.zen ] [img700]
failed to send join request to master [[img700][UO6rSvnqSl-eyoygRj-99g]
[inet[/208.94.2.146:9300]]{master=true}], reason
[org.elasticsearch.transport.RemoteTransportException: [img700][inet[/
208.94.2.146:9300]][discovery/zen/join];
org.elasticsearch.ElasticSearchIllegalStateException: Node [[img700]
[rk3NK_UyTpusT4FnXt3_MQ][inet[/208.94.2.146:9300]]{master=true}] not
master for join request from [[img700][rk3NK_UyTpusT4FnXt3_MQ][inet[/
208.94.2.146:9300]]{master=true}]]
[2012-02-24 10:55:24,561][INFO ][discovery.zen ] [img700]
failed to send join request to master [[img700][UO6rSvnqSl-eyoygRj-99g]
[inet[/208.94.2.146:9300]]{master=true}], reason
[org.elasticsearch.transport.RemoteTransportException: [img700][inet[/
208.94.2.146:9300]][discovery/zen/join];
org.elasticsearch.ElasticSearchIllegalStateException: Node [[img700]
[rk3NK_UyTpusT4FnXt3_MQ][inet[/208.94.2.146:9300]]{master=true}] not
master for join request from [[img700][rk3NK_UyTpusT4FnXt3_MQ][inet[/
208.94.2.146:9300]]{master=true}]]
[2012-02-24 10:55:27,568][INFO ][discovery.zen ] [img700]
failed to send join request to master [[img700][UO6rSvnqSl-eyoygRj-99g]
[inet[/208.94.2.146:9300]]{master=true}], reason
[org.elasticsearch.transport.RemoteTransportException: [img700][inet[/
208.94.2.146:9300]][discovery/zen/join];
org.elasticsearch.ElasticSearchIllegalStateException: Node [[img700]
[rk3NK_UyTpusT4FnXt3_MQ][inet[/208.94.2.146:9300]]{master=true}] not
master for join request from [[img700][rk3NK_UyTpusT4FnXt3_MQ][inet[/
208.94.2.146:9300]]{master=true}]]
[2012-02-24 10:55:30,778][INFO ][discovery.zen ] [img700]
failed to send join request to master [[img700][UO6rSvnqSl-eyoygRj-99g]
[inet[/208.94.2.146:9300]]{master=true}], reason
[org.elasticsearch.transport.RemoteTransportException: [img700][inet[/
208.94.2.146:9300]][discovery/zen/join];
org.elasticsearch.ElasticSearchIllegalStateException: Node [[img700]
[rk3NK_UyTpusT4FnXt3_MQ][inet[/208.94.2.146:9300]]{master=true}] not
master for join request from [[img700][rk3NK_UyTpusT4FnXt3_MQ][inet[/
208.94.2.146:9300]]{master=true}]]

This has been going on for 10 minutes or so... Any ideas what might
be wrong?

-Jack


(jacque74) #3

Speaking of which, I am using master as datanode also, and when first
master started it immediately started recovery, causing the discovery
of other datanodes to be very slow. I am also using local gateway, so
I was wondering if there were any ability to configure start up of
initial number (quorum) of datanodes in the cluster before recovery
starts?

-Jack

On Feb 25, 7:56 pm, Jack Levin magn...@gmail.com wrote:

I was able to resolve this by shutting down all nodes in the cluster,
and starting up masters first.

-Jack

On Feb 24, 10:57 am,JackLevinmagn...@gmail.com wrote:

After full cluster restart, my master is unable to elect itself, here
are some logs:

[2012-02-24 10:55:21,549][INFO ][discovery.zen ] [img700]
failed to send join request to master [[img700][UO6rSvnqSl-eyoygRj-99g]
[inet[/208.94.2.146:9300]]{master=true}], reason
[org.elasticsearch.transport.RemoteTransportException: [img700][inet[/
208.94.2.146:9300]][discovery/zen/join];
org.elasticsearch.ElasticSearchIllegalStateException: Node [[img700]
[rk3NK_UyTpusT4FnXt3_MQ][inet[/208.94.2.146:9300]]{master=true}] not
master for join request from [[img700][rk3NK_UyTpusT4FnXt3_MQ][inet[/
208.94.2.146:9300]]{master=true}]]
[2012-02-24 10:55:24,561][INFO ][discovery.zen ] [img700]
failed to send join request to master [[img700][UO6rSvnqSl-eyoygRj-99g]
[inet[/208.94.2.146:9300]]{master=true}], reason
[org.elasticsearch.transport.RemoteTransportException: [img700][inet[/
208.94.2.146:9300]][discovery/zen/join];
org.elasticsearch.ElasticSearchIllegalStateException: Node [[img700]
[rk3NK_UyTpusT4FnXt3_MQ][inet[/208.94.2.146:9300]]{master=true}] not
master for join request from [[img700][rk3NK_UyTpusT4FnXt3_MQ][inet[/
208.94.2.146:9300]]{master=true}]]
[2012-02-24 10:55:27,568][INFO ][discovery.zen ] [img700]
failed to send join request to master [[img700][UO6rSvnqSl-eyoygRj-99g]
[inet[/208.94.2.146:9300]]{master=true}], reason
[org.elasticsearch.transport.RemoteTransportException: [img700][inet[/
208.94.2.146:9300]][discovery/zen/join];
org.elasticsearch.ElasticSearchIllegalStateException: Node [[img700]
[rk3NK_UyTpusT4FnXt3_MQ][inet[/208.94.2.146:9300]]{master=true}] not
master for join request from [[img700][rk3NK_UyTpusT4FnXt3_MQ][inet[/
208.94.2.146:9300]]{master=true}]]
[2012-02-24 10:55:30,778][INFO ][discovery.zen ] [img700]
failed to send join request to master [[img700][UO6rSvnqSl-eyoygRj-99g]
[inet[/208.94.2.146:9300]]{master=true}], reason
[org.elasticsearch.transport.RemoteTransportException: [img700][inet[/
208.94.2.146:9300]][discovery/zen/join];
org.elasticsearch.ElasticSearchIllegalStateException: Node [[img700]
[rk3NK_UyTpusT4FnXt3_MQ][inet[/208.94.2.146:9300]]{master=true}] not
master for join request from [[img700][rk3NK_UyTpusT4FnXt3_MQ][inet[/
208.94.2.146:9300]]{master=true}]]

This has been going on for 10 minutes or so... Any ideas what might
be wrong?

-Jack


(Shay Banon) #4

I am not really sure what happened in our process. Regarding the initial mail, you did a full cluster restart, and then got into this state? Do you have special settings per node?

Regarding your last question, you can configure the gateway.recover_after_nodes setting which will cause the initial recovery process to happen once those nodes are met.https://github.com/elasticsearch/elasticsearch/blob/master/config/elasticsearch.yml#L253.

On Sunday, February 26, 2012 at 6:01 AM, Jack Levin wrote:

Speaking of which, I am using master as datanode also, and when first
master started it immediately started recovery, causing the discovery
of other datanodes to be very slow. I am also using local gateway, so
I was wondering if there were any ability to configure start up of
initial number (quorum) of datanodes in the cluster before recovery
starts?

-Jack

On Feb 25, 7:56 pm, Jack Levin <magn...@gmail.com (http://gmail.com)> wrote:

I was able to resolve this by shutting down all nodes in the cluster,
and starting up masters first.

-Jack

On Feb 24, 10:57 am,JackLevin<magn...@gmail.com (http://gmail.com)> wrote:

After full cluster restart, my master is unable to elect itself, here
are some logs:

[2012-02-24 10:55:21,549][INFO ][discovery.zen ] [img700]
failed to send join request to master [[img700][UO6rSvnqSl-eyoygRj-99g]
[inet[/208.94.2.146:9300]]{master=true}], reason
[org.elasticsearch.transport.RemoteTransportException: [img700][inet[/
208.94.2.146:9300]][discovery/zen/join];
org.elasticsearch.ElasticSearchIllegalStateException: Node [[img700]
[rk3NK_UyTpusT4FnXt3_MQ][inet[/208.94.2.146:9300]]{master=true}] not
master for join request from [[img700][rk3NK_UyTpusT4FnXt3_MQ][inet[/
208.94.2.146:9300]]{master=true}]]
[2012-02-24 10:55:24,561][INFO ][discovery.zen ] [img700]
failed to send join request to master [[img700][UO6rSvnqSl-eyoygRj-99g]
[inet[/208.94.2.146:9300]]{master=true}], reason
[org.elasticsearch.transport.RemoteTransportException: [img700][inet[/
208.94.2.146:9300]][discovery/zen/join];
org.elasticsearch.ElasticSearchIllegalStateException: Node [[img700]
[rk3NK_UyTpusT4FnXt3_MQ][inet[/208.94.2.146:9300]]{master=true}] not
master for join request from [[img700][rk3NK_UyTpusT4FnXt3_MQ][inet[/
208.94.2.146:9300]]{master=true}]]
[2012-02-24 10:55:27,568][INFO ][discovery.zen ] [img700]
failed to send join request to master [[img700][UO6rSvnqSl-eyoygRj-99g]
[inet[/208.94.2.146:9300]]{master=true}], reason
[org.elasticsearch.transport.RemoteTransportException: [img700][inet[/
208.94.2.146:9300]][discovery/zen/join];
org.elasticsearch.ElasticSearchIllegalStateException: Node [[img700]
[rk3NK_UyTpusT4FnXt3_MQ][inet[/208.94.2.146:9300]]{master=true}] not
master for join request from [[img700][rk3NK_UyTpusT4FnXt3_MQ][inet[/
208.94.2.146:9300]]{master=true}]]
[2012-02-24 10:55:30,778][INFO ][discovery.zen ] [img700]
failed to send join request to master [[img700][UO6rSvnqSl-eyoygRj-99g]
[inet[/208.94.2.146:9300]]{master=true}], reason
[org.elasticsearch.transport.RemoteTransportException: [img700][inet[/
208.94.2.146:9300]][discovery/zen/join];
org.elasticsearch.ElasticSearchIllegalStateException: Node [[img700]
[rk3NK_UyTpusT4FnXt3_MQ][inet[/208.94.2.146:9300]]{master=true}] not
master for join request from [[img700][rk3NK_UyTpusT4FnXt3_MQ][inet[/
208.94.2.146:9300]]{master=true}]]

This has been going on for 10 minutes or so... Any ideas what might
be wrong?

-Jack


(system) #5