Elasticsearch node random removed from cluster


(Jason000zhang) #1

I have a elasticsearch cluster one master , three hot nodes ,nine stale nodes
three hot nodes are used to bulk data ,but the hot nodes are removed and then added by itself
I change config fd as follow
discovery.zen.fd.ping_timeout: 600s
discovery.zen.fd.ping_retries: 6
discovery.zen.fd.ping_interval: 60s

but hot node still removed , I find trace log on master node

[2017-01-25T13:43:02,408][TRACE][o.e.t.TaskManager ] [oa-master-01] register 12456 [netty] [internal:discovery/zen/fd/master_ping] []
[2017-01-25T13:43:02,408][TRACE][o.e.d.z.MasterFaultDetection] [oa-master-01] checking ping from {oa-hot-02}{lpWsrreLTAeDPaAqPghmRw}{ZjGgLGT0TpKtZI2r4aCGMg}{10.255.246.55}{10.255.246.55:9300}{group=hot} under a cluster state thread
[2017-01-25T13:43:02,408][TRACE][o.e.c.s.ClusterService ] [oa-master-01] will process [master ping (from: {oa-hot-02}{lpWsrreLTAeDPaAqPghmRw}{ZjGgLGT0TpKtZI2r4aCGMg}{10.255.246.55}{10.255.246.55:9300}{group=hot})]
[2017-01-25T13:43:02,408][DEBUG][o.e.c.s.ClusterService ] [oa-master-01] processing [master ping (from: {oa-hot-02}{lpWsrreLTAeDPaAqPghmRw}{ZjGgLGT0TpKtZI2r4aCGMg}{10.255.246.55}{10.255.246.55:9300}{group=hot})]: execute
[2017-01-25T13:43:02,423][TRACE][o.e.c.s.ClusterService ] [oa-master-01] failed to execute cluster state update in [0s], state:
[2017-01-25T13:43:02,425][DEBUG][o.e.c.s.ClusterService ] [oa-master-01] cluster state update task [master ping (from: {oa-hot-02}{lpWsrreLTAeDPaAqPghmRw}{ZjGgLGT0TpKtZI2r4aCGMg}{10.255.246.55}{10.255.246.55:9300}{group=hot})] failed
[2017-01-25T13:43:02,425][TRACE][o.e.t.TaskManager ] [oa-master-01] unregister task for id: 12456

but i cannot find reason, please help ,thank you
my elasticsearch cluster version is 5.1.1


(Mark Walkom) #2

Is it GC? What happens in the logs on these hot nodes?


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.