Hi There,
We could see few nodes are getting disconnected and connecting again, in few seconds time. We are seeing the below log.
Because of this, shards are becoming unassigned, and have to assign back, which is taking little longer time.
Could you please help us, on why are we seeing this behavior of disconnection? Is it something with elasticsearch or network. ??
Also, we are observing this with 2 nodes majorly, out of 4 data nodes.
[2017-02-01 05:58:20,196][INFO ][discovery.zen ] [ITTESPROD-DATA3] master_left [{ITTESPROD-MSTR0}{E5JCBHhrQnKBs99HSZOH8Q}{10.158.36.200}{10.158.36.200:9300}{data=false, master=true}], reason [failed to ping, tried [3] times, each with maximum [30s] timeout]
[2017-02-01 05:58:20,196][WARN ][discovery.zen ] [ITTESPROD-DATA3] master left (reason = failed to ping, tried [3] times, each with maximum [30s] timeout), current nodes: {{ITTESPROD-MSTR2}{WtSucYmHRUCUX_Ld7R7fBA}{10.158.36.202}{10.158.36.202:9300}{data=false, master=true},{ITTESPROD-CLIENT2}{0fDvB6DMSgCRDfCXxE0TBg}{10.158.36.209}{10.158.36.209:9300}{data=false, master=false},{ITTESPROD-DATA2}{vJzgp0a0Q-WXnAyFKdHcKw}{10.158.36.212}{10.158.36.212:9300}{master=false},{ITTESPROD-DATA1}{XNRK5gWBR2SnyIvD8Wnz6w}{10.158.36.211}{10.158.36.211:9300}{master=false},{ITTESPROD-DATA3}{7m7OdCyORaKSsGEXduh55g}{10.158.36.204}{10.158.36.204:9300}{master=false},{ITTESPROD-MSTR1}{fnPHHE1xREaNnz4kA6rrSA}{10.158.36.201}{10.158.36.201:9300}{data=false, master=true},{ITTESPROD-CLIENT1}{lQ5OG-qySBmnLpshqTxxfQ}{10.158.36.199}{10.158.36.199:9300}{data=false, master=false},{ITTESPROD-CLIENT0}{cGF5_yyrRZOh9D0IyqgD3Q}{10.158.36.220}{10.158.36.220:9300}{data=false, master=false},{ITTESPROD-DATA4}{3IaCVr0JROqQT40o81Oyjw}{10.158.36.208}{10.158.36.208:9300}{master=false},}
[2017-02-01 05:58:20,196][INFO ][cluster.service ] [ITTESPROD-DATA3] removed {{ITTESPROD-MSTR0}{E5JCBHhrQnKBs99HSZOH8Q}{10.158.36.200}{10.158.36.200:9300}{data=false, master=true},}, reason: zen-disco-master_failed ({ITTESPROD-MSTR0}{E5JCBHhrQnKBs99HSZOH8Q}{10.158.36.200}{10.158.36.200:9300}{data=false, master=true})
[2017-02-01 05:58:24,823][INFO ][cluster.service ] [ITTESPROD-DATA3] detected_master {ITTESPROD-MSTR0}{E5JCBHhrQnKBs99HSZOH8Q}{10.158.36.200}{10.158.36.200:9300}{data=false, master=true}, added {{ITTESPROD-MSTR0}{E5JCBHhrQnKBs99HSZOH8Q}{10.158.36.200}{10.158.36.200:9300}{data=false, master=true},}, reason: zen-disco-receive(from master [{ITTESPROD-MSTR0}{E5JCBHhrQnKBs99HSZOH8Q}{10.158.36.200}{10.158.36.200:9300}{data=false, master=true}])
[2017-02-01 05:58:25,793][WARN ][transport ] [ITTESPROD-DATA3] Transport response handler not found of id [4404297]
.....
[2017-02-01 06:30:01,062][INFO ][discovery.zen ] [ITTESPROD-DATA3] master_left [{ITTESPROD-MSTR0}{E5JCBHhrQnKBs99HSZOH8Q}{10.158.36.200}{10.158.36.200:9300}{data=false, master=true}], reason [failed to ping, tried [3] times, each with maximum [30s] timeout]
[2017-02-01 06:30:01,062][WARN ][discovery.zen ] [ITTESPROD-DATA3] master left (reason = failed to ping, tried [3] times, each with maximum [30s] timeout), current nodes: {{ITTESPROD-DATA1}{XNRK5gWBR2SnyIvD8Wnz6w}{10.158.36.211}{10.158.36.211:9300}{master=false},{ITTESPROD-CLIENT1}{lQ5OG-qySBmnLpshqTxxfQ}{10.158.36.199}{10.158.36.199:9300}{data=false, master=false},{ITTESPROD-CLIENT2}{0fDvB6DMSgCRDfCXxE0TBg}{10.158.36.209}{10.158.36.209:9300}{data=false, master=false},{ITTESPROD-DATA4}{3IaCVr0JROqQT40o81Oyjw}{10.158.36.208}{10.158.36.208:9300}{master=false},{ITTESPROD-MSTR2}{WtSucYmHRUCUX_Ld7R7fBA}{10.158.36.202}{10.158.36.202:9300}{data=false, master=true},{ITTESPROD-MSTR1}{fnPHHE1xREaNnz4kA6rrSA}{10.158.36.201}{10.158.36.201:9300}{data=false, master=true},{ITTESPROD-DATA3}{7m7OdCyORaKSsGEXduh55g}{10.158.36.204}{10.158.36.204:9300}{master=false},{ITTESPROD-DATA2}{vJzgp0a0Q-WXnAyFKdHcKw}{10.158.36.212}{10.158.36.212:9300}{master=false},{ITTESPROD-CLIENT0}{cGF5_yyrRZOh9D0IyqgD3Q}{10.158.36.220}{10.158.36.220:9300}{data=false, master=false},}
[2017-02-01 06:30:01,062][INFO ][cluster.service ] [ITTESPROD-DATA3] removed {{ITTESPROD-MSTR0}{E5JCBHhrQnKBs99HSZOH8Q}{10.158.36.200}{10.158.36.200:9300}{data=false, master=true},}, reason: zen-disco-master_failed ({ITTESPROD-MSTR0}{E5JCBHhrQnKBs99HSZOH8Q}{10.158.36.200}{10.158.36.200:9300}{data=false, master=true})
[2017-02-01 06:30:05,709][INFO ][cluster.service ] [ITTESPROD-DATA3] detected_master {ITTESPROD-MSTR0}{E5JCBHhrQnKBs99HSZOH8Q}{10.158.36.200}{10.158.36.200:9300}{data=false, master=true}, added {{ITTESPROD-MSTR0}{E5JCBHhrQnKBs99HSZOH8Q}{10.158.36.200}{10.158.36.200:9300}{data=false, master=true},}, reason: zen-disco-receive(from master [{ITTESPROD-MSTR0}{E5JCBHhrQnKBs99HSZOH8Q}{10.158.36.200}{10.158.36.200:9300}{data=false, master=true}])
[2017-02-01 06:30:06,824][WARN ][transport ] [ITTESPROD-DATA3] Transport response handler not found of id [4406570]
...
[2017-02-01 06:37:23,661][WARN ][monitor.jvm ] [ITTESPROD-DATA3] [gc][young][168550][2873] duration [1.4s], collections [1]/[2.3s], total [1.4s]/[28m], memory [25.6gb]->[18.9gb]/[27gb], all_pools {[young] [7.1gb]->[72.6mb]/[7.4gb]}{[survivor] [897.2mb]->[955.6mb]/[955.6mb]}{[old] [17.6gb]->[17.9gb]/[18.6gb]}
[2017-02-01 06:38:15,755][WARN ][monitor.jvm ] [ITTESPROD-DATA3] [gc][young][168578][2874] duration [6.6s], collections [1]/[24.6s], total [6.6s]/[28.1m], memory [26.1gb]->[3.1gb]/[27gb], all_pools {[young] [7.3gb]->[77.2mb]/[7.4gb]}{[survivor] [955.6mb]->[0b]/[955.6mb]}{[old] [17.9gb]->[3gb]/[18.6gb]}
[2017-02-01 06:38:15,755][WARN ][monitor.jvm ] [ITTESPROD-DATA3] [gc][old][168578][102] duration [17.2s], collections [1]/[24.6s], total [17.2s]/[2.1h], memory [26.1gb]->[3.1gb]/[27gb], all_pools {[young] [7.3gb]->[77.2mb]/[7.4gb]}{[survivor] [955.6mb]->[0b]/[955.6mb]}{[old] [17.9gb]->[3gb]/[18.6gb]}