I have 2 EC2 in an AWS account where it appears that the master keeps
forgetting about the slave node.
In the slave node logs (I removed the IPs and time for simplicity, the
master is "Cordelia Frost" and the slave is "Chronos"):
[discovery.zen.fd] [Chronos] [master] pinging a master [Cordelia Frost] but
we do not exists on it, act as if its master failure
[discovery.zen.fd] [Chronos] [master] stopping fault detection against
master [Cordelia Frost], reason [master failure, do not exists on master,
act as master failure]
[discovery.ec2] [Chronos] master_left [Cordelia Frost], reason [do not
exists on master, act as master failure]
[discovery.ec2] [Chronos] master left (reason = do not exists on master,
act as master failure), current nodes: {[Chronos]}
[cluster.service] [Chronos] removed {[Cordelia Frost]}, reason:
zen-disco-master_failed ([Cordelia Frost])
[discovery.ec2] [Chronos] using dynamic discovery nodes
[discovery.ec2] [Chronos] using dynamic discovery nodes
[discovery.ec2] [Chronos] using dynamic discovery nodes
[discovery.ec2] [Chronos] filtered ping responses: (filter_client[true],
filter_data[false])
--> ping_response{node [Cordelia Frost], id[353], master [Cordelia
Frost], hasJoinedOnce [true], cluster_name[cluster]}
[discovery.zen.publish] [Chronos] received cluster state version 232374
[discovery.zen.fd] [Chronos] [master] restarting fault detection against
master [Cordelia Frost], reason [new cluster state received and we are
monitoring the wrong master [null]]
[discovery.ec2] [Chronos] got first state from fresh master
[cluster.service] [Chronos] detected_master [Cordelia Frost], added
{[Cordelia Frost]}, reason: zen-disco-receive(from master [Cordelia Frost])
"Chronos" then receives the cluster state and everything goes back to
normal.
This happens about on quite regular intervals (usually once per hour,
although some times it takes more time to happen). Any idea of what can be
causing this?
I have a ping timeout of 15s on discovery.ec2, so I think that ping latency
should not be the problem. I also do hourly snapshots with curator, in case
that's relevant.
Finally, I also have another elasticsearch cluster with the same
configuration on a different AWS account (used for testing purposes), and
that problem has never occured. Can this be related to the AWS region?
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8c367dc4-c388-4b9c-aa91-34d6fcadb156%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.