This post was flagged by the community and is temporarily hidden.
Can you explain your topology a little more?
Are all nodes in the same location?
@warkolm Sure! here is my elasticsearch.yml from all machines
ES-01
cluster.name: india_farmers
cluster.routing.allocation.awareness.attributes: zone
cluster.routing.allocation.awareness.force.zone.values: zone1,zone2
node.name: elastic-01
node.attr.zone: zone1
node.master: true
node.data: true
node.ingest: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 10.0.87.207
discovery.seed_hosts:
- 10.0.87.207:9300
- 10.0.87.208:9300
- 10.0.87.209:9300
- 10.0.87.210:9300
- 10.0.225.232:9300
cluster.initial_master_nodes:
- 10.0.87.207:9300
- 10.0.87.209:9300
- 10.0.225.232:9300
xpack.security.enabled: false
path.repo: ["/opt/elasticsearch-backup"]
ES-02
cluster.name: india_farmers
node.name: elastic-02
node.attr.zone: zone1
node.master: false
node.data: true
node.ingest: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 10.0.87.208
discovery.seed_hosts:
- 10.0.87.207:9300
- 10.0.87.208:9300
- 10.0.87.209:9300
- 10.0.87.210:9300
- 10.0.225.232:9300
cluster.initial_master_nodes:
- 10.0.87.207:9300
- 10.0.87.209:9300
- 10.0.225.232:9300
xpack.security.enabled: false
path.repo: ["/opt/elasticsearch-backup"]
ES-03
cluster.name: india_farmers
cluster.routing.allocation.awareness.attributes: zone
cluster.routing.allocation.awareness.force.zone.values: zone1,zone2
node.name: elastic-03
node.attr.zone: zone2
node.master: true
node.data: true
node.ingest: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 10.0.87.209
discovery.seed_hosts:
- 10.0.87.207:9300
- 10.0.87.208:9300
- 10.0.87.209:9300
- 10.0.87.210:9300
- 10.0.225.232:9300
cluster.initial_master_nodes:
- 10.0.87.207:9300
- 10.0.87.209:9300
- 10.0.225.232:9300
xpack.security.enabled: false
path.repo: ["/opt/elasticsearch-backup"]
ES-04
cluster.name: india_farmers
node.name: elastic-04
node.attr.zone: zone2
node.master: false
node.data: true
node.ingest: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 10.0.87.210
discovery.seed_hosts:
- 10.0.87.207:9300
- 10.0.87.208:9300
- 10.0.87.209:9300
- 10.0.87.210:9300
- 10.0.225.232:9300
cluster.initial_master_nodes:
- 10.0.87.207:9300
- 10.0.87.209:9300
- 10.0.225.232:9300
xpack.security.enabled: false
path.repo: ["/opt/elasticsearch-backup"]
ES-05
cluster.name: india_farmers
node.name: elastic-05
node.master: true
cluster.routing.allocation.awareness.attributes: zone
cluster.routing.allocation.awareness.force.zone.values: zone1,zone2
node.data: false
node.ingest: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 10.0.225.232
discovery.seed_hosts:
- 10.0.87.207:9300
- 10.0.87.208:9300
- 10.0.87.209:9300
- 10.0.87.210:9300
- 10.0.225.232:9300
cluster.initial_master_nodes:
- 10.0.87.207:9300
- 10.0.87.209:9300
- 10.0.225.232:9300
xpack.security.enabled: false
path.repo: ["/opt/elasticsearch-backup"]
And my physical layout
Datacenter1
Elastic-01
Elastic-02
Datacenter2
Elastic-03
Elastic-04
Datacenter3
Elastic-05
all this datacenters are in the same city and here is ping between the nodes in different DC
from Elastic-01 (Datacenter1) to Elastic-03 (Datacenter2)
PING 10.0.87.209 (10.0.87.209) 56(84) bytes of data.
64 bytes from 10.0.87.209: icmp_seq=1 ttl=64 time=0.235 ms
64 bytes from 10.0.87.209: icmp_seq=2 ttl=64 time=0.217 ms
64 bytes from 10.0.87.209: icmp_seq=3 ttl=64 time=0.255 ms
64 bytes from 10.0.87.209: icmp_seq=4 ttl=64 time=0.251 ms
64 bytes from 10.0.87.209: icmp_seq=5 ttl=64 time=0.262 ms
64 bytes from 10.0.87.209: icmp_seq=6 ttl=64 time=0.261 ms
64 bytes from 10.0.87.209: icmp_seq=7 ttl=64 time=0.218 ms
64 bytes from 10.0.87.209: icmp_seq=8 ttl=64 time=0.229 ms
64 bytes from 10.0.87.209: icmp_seq=9 ttl=64 time=0.264 ms
64 bytes from 10.0.87.209: icmp_seq=10 ttl=64 time=0.218 ms
ping from Elastic-01 to Elastic-05
ping -c 10 10.0.225.232
PING 10.0.225.232 (10.0.225.232) 56(84) bytes of data.
64 bytes from 10.0.225.232: icmp_seq=1 ttl=61 time=1.59 ms
64 bytes from 10.0.225.232: icmp_seq=2 ttl=61 time=1.31 ms
64 bytes from 10.0.225.232: icmp_seq=3 ttl=61 time=1.27 ms
64 bytes from 10.0.225.232: icmp_seq=4 ttl=61 time=1.45 ms
64 bytes from 10.0.225.232: icmp_seq=5 ttl=61 time=1.29 ms
64 bytes from 10.0.225.232: icmp_seq=6 ttl=61 time=1.24 ms
64 bytes from 10.0.225.232: icmp_seq=7 ttl=61 time=1.18 ms
64 bytes from 10.0.225.232: icmp_seq=8 ttl=61 time=1.54 ms
64 bytes from 10.0.225.232: icmp_seq=9 ttl=61 time=1.19 ms
64 bytes from 10.0.225.232: icmp_seq=10 ttl=61 time=1.13 ms
The elected master will be logging a message containing the string node-left
when the node leaves. Can you share all copies of this message, so we can see the pattern over time?
You want to see logs from the current master node, right?
Yes.
@DavidTurner Here is the log from the master node
[2021-02-03T01:45:31,310][INFO ][o.e.c.s.ClusterApplierService] [elastic-03] removed {{elastic-05}{Uhj07HVbQQ2IEno38VzI9A}{GRcJlXQiTq6tqJMZ9E_REw}{10.0.225.232}{10.0.225.232:9300}{ilm}{ml.machine_memory=16656920576, ml.max_open_jobs=20, xpack.installed=true},}, term: 3, version: 12993, reason: Publication{term=3, version=12993}
[2021-02-03T01:45:33,538][INFO ][o.e.c.s.MasterService ] [elastic-03] node-join[{elastic-05}{Uhj07HVbQQ2IEno38VzI9A}{GRcJlXQiTq6tqJMZ9E_REw}{10.0.225.232}{10.0.225.232:9300}{ilm}{ml.machine_memory=16656920576, ml.max_open_jobs=20, xpack.installed=true} join existing leader], term: 3, version: 12994, reason: added {{elastic-05}{Uhj07HVbQQ2IEno38VzI9A}{GRcJlXQiTq6tqJMZ9E_REw}{10.0.225.232}{10.0.225.232:9300}{ilm}{ml.machine_memory=16656920576, ml.max_open_jobs=20, xpack.installed=true},}
[2021-02-03T01:45:33,563][INFO ][o.e.c.s.ClusterApplierService] [elastic-03] added {{elastic-05}{Uhj07HVbQQ2IEno38VzI9A}{GRcJlXQiTq6tqJMZ9E_REw}{10.0.225.232}{10.0.225.232:9300}{ilm}{ml.machine_memory=16656920576, ml.max_open_jobs=20, xpack.installed=true},}, term: 3, version: 12994, reason: Publication{term=3, version=12994}
[2021-02-03T01:50:34,174][INFO ][o.e.c.s.MasterService ] [elastic-03] node-left[{elastic-05}{Uhj07HVbQQ2IEno38VzI9A}{GRcJlXQiTq6tqJMZ9E_REw}{10.0.225.232}{10.0.225.232:9300}{ilm}{ml.machine_memory=16656920576, ml.max_open_jobs=20, xpack.installed=true} disconnected], term: 3, version: 12996, reason: removed {{elastic-05}{Uhj07HVbQQ2IEno38VzI9A}{GRcJlXQiTq6tqJMZ9E_REw}{10.0.225.232}{10.0.225.232:9300}{ilm}{ml.machine_memory=16656920576, ml.max_open_jobs=20, xpack.installed=true},}
[2021-02-03T01:50:34,184][INFO ][o.e.c.s.ClusterApplierService] [elastic-03] removed {{elastic-05}{Uhj07HVbQQ2IEno38VzI9A}{GRcJlXQiTq6tqJMZ9E_REw}{10.0.225.232}{10.0.225.232:9300}{ilm}{ml.machine_memory=16656920576, ml.max_open_jobs=20, xpack.installed=true},}, term: 3, version: 12996, reason: Publication{term=3, version=12996}
[2021-02-03T01:50:36,626][INFO ][o.e.c.s.MasterService ] [elastic-03] node-join[{elastic-05}{Uhj07HVbQQ2IEno38VzI9A}{GRcJlXQiTq6tqJMZ9E_REw}{10.0.225.232}{10.0.225.232:9300}{ilm}{ml.machine_memory=16656920576, ml.max_open_jobs=20, xpack.installed=true} join existing leader], term: 3, version: 12997, reason: added {{elastic-05}{Uhj07HVbQQ2IEno38VzI9A}{GRcJlXQiTq6tqJMZ9E_REw}{10.0.225.232}{10.0.225.232:9300}{ilm}{ml.machine_memory=16656920576, ml.max_open_jobs=20, xpack.installed=true},}
[2021-02-03T01:50:36,648][INFO ][o.e.c.s.ClusterApplierService] [elastic-03] added {{elastic-05}{Uhj07HVbQQ2IEno38VzI9A}{GRcJlXQiTq6tqJMZ9E_REw}{10.0.225.232}{10.0.225.232:9300}{ilm}{ml.machine_memory=16656920576, ml.max_open_jobs=20, xpack.installed=true},}, term: 3, version: 12997, reason: Publication{term=3, version=12997}
[2021-02-03T01:54:06,640][INFO ][o.e.c.r.a.DiskThresholdMonitor] [elastic-03] skipping monitor as a check is already in progress
[2021-02-03T01:55:36,540][INFO ][o.e.c.s.MasterService ] [elastic-03] node-left[{elastic-05}{Uhj07HVbQQ2IEno38VzI9A}{GRcJlXQiTq6tqJMZ9E_REw}{10.0.225.232}{10.0.225.232:9300}{ilm}{ml.machine_memory=16656920576, ml.max_open_jobs=20, xpack.installed=true} disconnected], term: 3, version: 12999, reason: removed {{elastic-05}{Uhj07HVbQQ2IEno38VzI9A}{GRcJlXQiTq6tqJMZ9E_REw}{10.0.225.232}{10.0.225.232:9300}{ilm}{ml.machine_memory=16656920576, ml.max_open_jobs=20, xpack.installed=true},}
[2021-02-03T01:55:36,550][INFO ][o.e.c.s.ClusterApplierService] [elastic-03] removed {{elastic-05}{Uhj07HVbQQ2IEno38VzI9A}{GRcJlXQiTq6tqJMZ9E_REw}{10.0.225.232}{10.0.225.232:9300}{ilm}{ml.machine_memory=16656920576, ml.max_open_jobs=20, xpack.installed=true},}, term: 3, version: 12999, reason: Publication{term=3, version=12999}
[2021-02-03T01:55:38,793][INFO ][o.e.c.s.MasterService ] [elastic-03] node-join[{elastic-05}{Uhj07HVbQQ2IEno38VzI9A}{GRcJlXQiTq6tqJMZ9E_REw}{10.0.225.232}{10.0.225.232:9300}{ilm}{ml.machine_memory=16656920576, ml.max_open_jobs=20, xpack.installed=true} join existing leader], term: 3, version: 13000, reason: added {{elastic-05}{Uhj07HVbQQ2IEno38VzI9A}{GRcJlXQiTq6tqJMZ9E_REw}{10.0.225.232}{10.0.225.232:9300}{ilm}{ml.machine_memory=16656920576, ml.max_open_jobs=20, xpack.installed=true},}
[2021-02-03T01:55:38,815][INFO ][o.e.c.s.ClusterApplierService] [elastic-03] added {{elastic-05}{Uhj07HVbQQ2IEno38VzI9A}{GRcJlXQiTq6tqJMZ9E_REw}{10.0.225.232}{10.0.225.232:9300}{ilm}{ml.machine_memory=16656920576, ml.max_open_jobs=20, xpack.installed=true},}, term: 3, version: 13000, reason: Publication{term=3, version=13000}
[2021-02-03T02:00:38,161][INFO ][o.e.c.s.MasterService ] [elastic-03] node-left[{elastic-05}{Uhj07HVbQQ2IEno38VzI9A}{GRcJlXQiTq6tqJMZ9E_REw}{10.0.225.232}{10.0.225.232:9300}{ilm}{ml.machine_memory=16656920576, ml.max_open_jobs=20, xpack.installed=true} disconnected], term: 3, version: 13002, reason: removed {{elastic-05}{Uhj07HVbQQ2IEno38VzI9A}{GRcJlXQiTq6tqJMZ9E_REw}{10.0.225.232}{10.0.225.232:9300}{ilm}{ml.machine_memory=16656920576, ml.max_open_jobs=20, xpack.installed=true},}
[2021-02-03T02:00:38,174][INFO ][o.e.c.s.ClusterApplierService] [elastic-03] removed {{elastic-05}{Uhj07HVbQQ2IEno38VzI9A}{GRcJlXQiTq6tqJMZ9E_REw}{10.0.225.232}{10.0.225.232:9300}{ilm}{ml.machine_memory=16656920576, ml.max_open_jobs=20, xpack.installed=true},}, term: 3, version: 13002, reason: Publication{term=3, version=13002}
[2021-02-03T02:00:40,270][INFO ][o.e.c.s.MasterService ] [elastic-03] node-join[{elastic-05}{Uhj07HVbQQ2IEno38VzI9A}{GRcJlXQiTq6tqJMZ9E_REw}{10.0.225.232}{10.0.225.232:9300}{ilm}{ml.machine_memory=16656920576, ml.max_open_jobs=20, xpack.installed=true} join existing leader], term: 3, version: 13003, reason: added {{elastic-05}{Uhj07HVbQQ2IEno38VzI9A}{GRcJlXQiTq6tqJMZ9E_REw}{10.0.225.232}{10.0.225.232:9300}{ilm}{ml.machine_memory=16656920576, ml.max_open_jobs=20, xpack.installed=true},}
[2021-02-03T02:00:40,297][INFO ][o.e.c.s.ClusterApplierService] [elastic-03] added {{elastic-05}{Uhj07HVbQQ2IEno38VzI9A}{GRcJlXQiTq6tqJMZ9E_REw}{10.0.225.232}{10.0.225.232:9300}{ilm}{ml.machine_memory=16656920576, ml.max_open_jobs=20, xpack.installed=true},}, term: 3, version: 13003, reason: Publication{term=3, version=13003}
[2021-02-03T02:05:40,147][INFO ][o.e.c.s.MasterService ] [elastic-03] node-left[{elastic-05}{Uhj07HVbQQ2IEno38VzI9A}{GRcJlXQiTq6tqJMZ9E_REw}{10.0.225.232}{10.0.225.232:9300}{ilm}{ml.machine_memory=16656920576, ml.max_open_jobs=20, xpack.installed=true} disconnected], term: 3, version: 13005, reason: removed {{elastic-05}{Uhj07HVbQQ2IEno38VzI9A}{GRcJlXQiTq6tqJMZ9E_REw}{10.0.225.232}{10.0.225.232:9300}{ilm}{ml.machine_memory=16656920576, ml.max_open_jobs=20, xpack.installed=true},}
[2021-02-03T02:05:40,158][INFO ][o.e.c.s.ClusterApplierService] [elastic-03] removed {{elastic-05}{Uhj07HVbQQ2IEno38VzI9A}{GRcJlXQiTq6tqJMZ9E_REw}{10.0.225.232}{10.0.225.232:9300}{ilm}{ml.machine_memory=16656920576, ml.max_open_jobs=20, xpack.installed=true},}, term: 3, version: 13005, reason: Publication{term=3, version=13005}
[2021-02-03T02:05:42,839][INFO ][o.e.c.s.MasterService ] [elastic-03] node-join[{elastic-05}{Uhj07HVbQQ2IEno38VzI9A}{GRcJlXQiTq6tqJMZ9E_REw}{10.0.225.232}{10.0.225.232:9300}{ilm}{ml.machine_memory=16656920576, ml.max_open_jobs=20, xpack.installed=true} join existing leader], term: 3, version: 13006, reason: added {{elastic-05}{Uhj07HVbQQ2IEno38VzI9A}{GRcJlXQiTq6tqJMZ9E_REw}{10.0.225.232}{10.0.225.232:9300}{ilm}{ml.machine_memory=16656920576, ml.max_open_jobs=20, xpack.installed=true},}
[2021-02-03T02:05:42,864][INFO ][o.e.c.s.ClusterApplierService] [elastic-03] added {{elastic-05}{Uhj07HVbQQ2IEno38VzI9A}{GRcJlXQiTq6tqJMZ9E_REw}{10.0.225.232}{10.0.225.232:9300}{ilm}{ml.machine_memory=16656920576, ml.max_open_jobs=20, xpack.installed=true},}, term: 3, version: 13006, reason: Publication{term=3, version=13006}
[2021-02-03T02:10:42,103][INFO ][o.e.c.s.MasterService ] [elastic-03] node-left[{elastic-05}{Uhj07HVbQQ2IEno38VzI9A}{GRcJlXQiTq6tqJMZ9E_REw}{10.0.225.232}{10.0.225.232:9300}{ilm}{ml.machine_memory=16656920576, ml.max_open_jobs=20, xpack.installed=true} disconnected], term: 3, version: 13008, reason: removed {{elastic-05}{Uhj07HVbQQ2IEno38VzI9A}{GRcJlXQiTq6tqJMZ9E_REw}{10.0.225.232}{10.0.225.232:9300}{ilm}{ml.machine_memory=16656920576, ml.max_open_jobs=20, xpack.installed=true},}
[2021-02-03T02:10:42,115][INFO ][o.e.c.s.ClusterApplierService] [elastic-03] removed {{elastic-05}{Uhj07HVbQQ2IEno38VzI9A}{GRcJlXQiTq6tqJMZ9E_REw}{10.0.225.232}{10.0.225.232:9300}{ilm}{ml.machine_memory=16656920576, ml.max_open_jobs=20, xpack.installed=true},}, term: 3, version: 13008, reason: Publication{term=3, version=13008}
disconnected
means a TCP connection was dropped. Since it happens every 5 minutes it's probably something on the network between the nodes with a 5-minute timeout. See these docs for information:
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.