Hi Experts,
I encountered master_ping issues with our 3 nodes ES cluster. ES version is 5.3.2 and is running on Ubuntu servers on AWS EC2:
es-1 log:
[2018-02-05T07:47:59,899][WARN ][o.e.t.TransportService ] [xxx.xxx.xxx.71] Received response for a request that has timed out, sent [87457ms] ago, timed out [57457ms] ago, action [internal:discovery/zen/fd/master_ping], node [{xxx.xxx.xxx.165}{LzrWTDLYRr6f2Hh-voT1Dg}{zfCxxx4qRF6AR86co1tciQ}{xxx.xxx.xxx.165}{xxx.xxx.xxx.165:9300}], id [1738506]
[2018-02-05T07:48:02,448][INFO ][o.e.d.z.ZenDiscovery ] [xxx.xxx.xxx.71] master_left [{xxx.xxx.xxx.165}{LzrWTDLYRr6f2Hh-voT1Dg}{zfCxxx4qRF6AR86co1tciQ}{xxx.xxx.xxx.165}{xxx.xxx.xxx.165:9300}], reason [failed to ping, tried [3] times, each with maximum [30s] timeout]
es-2 log:
[2018-02-05T07:51:37,119][INFO ][o.e.d.z.ZenDiscovery ] [xxx.xxx.xxx.148] master_left [{xxx.xxx.xxx.165}{LzrWTDLYRr6f2Hh-voT1Dg}{zfCxxx4qRF6AR86co1tciQ}{xxx.xxx.xxx.165}{xxx.xxx.xxx.165:9300}], reason [no longer master]
org.elasticsearch.transport.RemoteTransportException: [xxx.xxx.xxx.165][xxx.xxx.xxx.165:9300][internal:discovery/zen/fd/master_ping]
[2018-02-05T07:55:16,633][INFO ][o.e.d.z.ZenDiscovery ] [xxx.xxx.xxx.148] master_left [{xxx.xxx.xxx.71}{P70F55GvTIGIXOdj-ZOFOg}{FhmDCEK6RkiQf9lI3zFO-w}{xxx.xxx.xxx.71}{xxx.xxx.xxx.71:9300}], reason [failed to ping, tried [3] times, each with maximum [30s] timeout]
[2018-02-05T07:55:23,262][WARN ][o.e.t.TransportService ] [xxx.xxx.xxx.148] Received response for a request that has timed out, sent [96630ms] ago, timed out [66630ms] ago, action [internal:discovery/zen/fd/master_ping], node [{xxx.xxx.xxx.71}{P70F55GvTIGIXOdj-ZOFOg}{FhmDCEK6RkiQf9lI3zFO-w}{xxx.xxx.xxx.71}{xxx.xxx.xxx.71:9300}], id [4946066]
es-3 log:
[2018-02-05T08:14:45,600][INFO ][o.e.d.z.ZenDiscovery ] [xxx.xxx.xxx.165] master_left [{xxx.xxx.xxx.71}{P70F55GvTIGIXOdj-ZOFOg}{FhmDCEK6RkiQf9lI3zFO-w}{xxx.xxx.xxx.71}{xxx.xxx.xxx.71:9300}], reason [failed to ping, tried [3] times, each with maximum [30s] timeout]
[2018-02-05T08:15:21,007][WARN ][o.e.t.TransportService ] [xxx.xxx.xxx.165] Received response for a request that has timed out, sent [125408ms] ago, timed out [95408ms] ago, action [internal:discovery/zen/fd/master_ping], node [{xxx.xxx.xxx.71}{P70F55GvTIGIXOdj-ZOFOg}{FhmDCEK6RkiQf9lI3zFO-w}{xxx.xxx.xxx.71}{xxx.xxx.xxx.71:9300}], id [4946657]
Then two of five shards became UNASSIGNED status.
Does anyone know the reason?