hi,
version: 5.1.1
my cluster always yellow, a lot of unassinged shards. logs show lot of "master left"
[2018-02-06T05:16:04,873][WARN ][o.e.i.c.IndicesClusterStateService] [node-15] [[xxx_log_201802][150]] marking and sending shard failed due to [shard failure, reason [primary shard [[xxx_log_201802][150], node[llX2oMHOT6aMFTpfjvXilg], [P], s[STARTED], a[id=GnaqByImR5y06I6bn5G0MQ]] was demoted while failing replica shard]]
org.elasticsearch.cluster.action.shard.ShardStateAction$NoLongerPrimaryShardException: primary term [10] did not match current primary term [11]
at org.elasticsearch.cluster.action.shard.ShardStateAction$ShardFailedClusterStateTaskExecutor.execute(ShardStateAction.java:280) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.cluster.service.ClusterService.runTasksForExecutor(ClusterService.java:581) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.cluster.service.ClusterService$UpdateTask.run(ClusterService.java:920) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:458) [elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:238) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:201) ~[elasticsearch-5.1.1.jar:5.1.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_92]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_92]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_92]
[2018-02-06T05:16:05,556][INFO ][o.e.d.z.ZenDiscovery ] [node-15] master_left [{node-1}{zzJVtCQDQFaNm_jNx2YrjA}{3vrBZme0R_aCZwqZHuoJDQ}{10.4.71.30}{10.4.71.30:9300}], reason [failed to ping, tried [3] times, each with maximum [30s] timeout]
[2018-02-06T05:16:05,557][WARN ][o.e.d.z.ZenDiscovery ] [node-15] master left (reason = failed to ping, tried [3] times, each with maximum [30s] timeout), current nodes: nodes:
{node-9}{UsskhSP7StKKpuD_GcAcGw}{MYFs7ktOQ5KZsuaFpjW_0g}{10.4.71.67}{10.4.71.67:9300}
{node-10}{MaoCFIexSneolK-a9DTtgQ}{DdsgyhqbQ7ipE4rlV2LHyQ}{10.4.71.68}{10.4.71.68:9300}
{node-4}{NPO_-oC7RGikhqcmwc6JxA}{wGKtE55VT8uhp1mI5QaOlA}{10.4.71.33}{10.4.71.33:9300}
{node-5}{qUDCyodwSq63x3Z3s5LlUw}{StzL-uBsTIug6gWmhzB7qA}{10.4.71.34}{10.4.71.34:9300}
{node-8}{lTyb_Y1CQRapvztkK-Uz1g}{wLHuYpgDQMaHisDWWslO_Q}{10.4.71.66}{10.4.71.66:9300}
{node-3}{KrYOYlfmRC6yS0uvNIBSHA}{X-glH3VmT-2c9_XMQsTtpw}{10.4.71.32}{10.4.71.32:9300}
{node-7}{ABmukxbzSQGgLJ29DBUnjA}{ppo9n1vgSw-LYq-0BO_Dmw}{10.4.71.36}{10.4.71.36:9300}
{node-11}{oSYHx9hgTZqNoI0gIqO6Rw}{BJa9EDLpSje4865hn0bXlw}{10.4.71.69}{10.4.71.69:9300}
{node-6}{T7J4E5H4RqmM2DpJl4J3bA}{22uVp89gTvewPzhR6buhAA}{10.4.71.35}{10.4.71.35:9300}
{node-14}{dPEClHXZR2iFpaiLw-I-nQ}{jkBVFbbiRs2d48jjoDAdAQ}{10.4.71.72}{10.4.71.72:9300}
{node-13}{w82I1FcSQm6bqZAypk1P-g}{JZEc57UTQ6q90SsgV9IJRg}{10.4.71.71}{10.4.71.71:9300}
{node-15}{llX2oMHOT6aMFTpfjvXilg}{VXe3JphuT6aCbbAVRoB9CQ}{10.4.71.73}{10.4.71.73:9300}, local
{node-12}{VnLoFQVhTragjMlTo_3TzA}{aAhZplHLSISNczETVnimPw}{10.4.71.70}{10.4.71.70:9300}
{node-2}{qdhLZ9OPREiXE7L-G1GWlg}{QtCiGLrHS5eVuSYaIyhiGw}{10.4.71.31}{10.4.71.31:9300}
[2018-02-06T05:16:05,557][INFO ][o.e.c.s.ClusterService ] [node-15] removed {{node-1}{zzJVtCQDQFaNm_jNx2YrjA}{3vrBZme0R_aCZwqZHuoJDQ}{10.4.71.30}{10.4.71.30:9300},}, reason: master_failed ({node-1}{zzJVtCQDQFaNm_jNx2YrjA}{3vrBZme0R_aCZwqZHuoJDQ}{10.4.71.30}{10.4.71.30:9300})
[2018-02-06T05:16:08,830][INFO ][o.e.c.s.ClusterService ] [node-15] detected_master {node-1}{zzJVtCQDQFaNm_jNx2YrjA}{3vrBZme0R_aCZwqZHuoJDQ}{10.4.71.30}{10.4.71.30:9300}, added {{node-1}{zzJVtCQDQFaNm_jNx2YrjA}{3vrBZme0R_aCZwqZHuoJDQ}{10.4.71.30}{10.4.71.30:9300},}, reason: zen-disco-receive(from master [master {node-1}{zzJVtCQDQFaNm_jNx2YrjA}{3vrBZme0R_aCZwqZHuoJDQ}{10.4.71.30}{10.4.71.30:9300} committed version [77351]])
[2018-02-06T05:16:58,138][INFO ][o.e.m.j.JvmGcMonitorService] [node-15] [gc][1861024] overhead, spent [433ms] collecting in the last [1s]
my cluster health :
{
"cluster_name": "xxxxxxx",
"status": "yellow",
"timed_out": false,
"number_of_nodes": 15,
"number_of_data_nodes": 15,
"active_primary_shards": 1465,
"active_shards": 2524,
"relocating_shards": 0,
"initializing_shards": 5,
"unassigned_shards": 164,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 0,
"active_shards_percent_as_number": 93.72447085035277
}
my config :
cluster.name: xxxxxx
node.name: node-15
path.data: /data1/elasticsearch/data
network.host: 0.0.0.0
discovery.zen.ping.unicast.hosts: ["10.4.71.30", "10.4.71.31", "10.4.71.32", "10.4.71.33", "10.4.71.34", "10.4.71.35", "10.4.71.36", "10.4.71.66", "10.4.71.67", "10.4.71.68", "10.4.71.69", "10.4.71.70","10.4.71.71","10.4.71.72","10.4.71.73"]
reindex.remote.whitelist: ["10.5.24.139:9200"]
node.master: true
node.data: true
bootstrap.memory_lock: true