Elasticsearch version: 7.0.1
Master nodes: 3
Datanodes: 31
When one of my index fail to initiate primary shard (in my case its the .monitoring-es-7-2019.08.12 index). The leader master will fail in some minutes.
The reason of that index fail to be initiated primary shard may likely a hard disk fail in the datanode. But this makes no sense to collapse master node. Also master will try to remove the problematic datanode, but add it back in every one minute or so. This loop will continue until master fail.
I understand that master node will ping and remove a datanode if it fails. But not vice versa, right?
Sample log
{"log":"[2019-08-12T04:14:11,672][INFO ][o.e.c.s.ClusterApplierService] [dc17-esmaster-04] removed {{dc17-esdata-02}{gFQlBesvQxaIZy4Sfx6Xtg}{1Zue7DvTSTWoBSdoPH3c9Q}{dc17-esdata-02}{10.36.60.55:9302}{ml.machine_memory=405543784448, rack=1, ml.max_open_jobs=20, xpack.installed=true},}, term: 18685, version: 98825, reason: ApplyCommitRequest{term=18685, version=98825, sourceNode={dc17-esmaster-02}{nE5cqi4OQKui5VSWz1hW7g}{1DV1Umc6RPOYvU_lBozVuA}{dc17-esmaster-02}{10.36.60.56:9300}{ml.machine_memory=405543784448, rack=2, ml.max_open_jobs=20, xpack.installed=true}}\n","stream":"stdout","time":"2019-08-12T04:14:11.672496264Z"}
{"log":"[2019-08-12T04:14:59,996][WARN ][o.e.x.m.e.l.LocalExporter] [dc17-esmaster-04] unexpected error while indexing monitoring document\n","stream":"stdout","time":"2019-08-12T04:15:00.000309393Z"}
{"log":"org.elasticsearch.xpack.monitoring.exporter.ExportException: UnavailableShardsException[[.monitoring-es-7-2019.08.12][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[.monitoring-es-7-2019.08.12][0]] containing [index {[.monitoring-es-7-2019.08.12][_doc][faAIhGwBLZ0qFg9pa-o2], source[{\"cluster_uuid\":\"qPAXvLX3T3Gdj2hnq1xHhg\",\"timestamp\":\"2019-08-12T04:13:59.984Z\",\"interval_ms\":10000,\"type\":\"node_stats\",\"source_node\":{\"uuid\":\"LSS3ms9gSlmisxOgLmQZXw\",\"host\":\"dc17-esmaster-04\",\"transport_address\":\"10.36.60.58:9300\",\"ip\":\"10.36.60.58\",\"name\":\"dc17-esmaster-04\",\"timestamp\":\"2019-08-12T04:13:59.984Z\"},\"node_stats\":{\"node_id\":\"LSS3ms9gSlmisxOgLmQZXw\",\"node_master\":false,\"mlockall\":true,\"indices\":{\"docs\":{\"count\":0},\"store\":{\"size_in_bytes\":0},\"indexing\":{\"index_total\":0,\"index_time_in_millis\":0,\"throttle_time_in_millis\":0},\"search\":{\"query_total\":0,\"query_time_in_millis\":0},\"query_cache\":{\"memory_size_in_bytes\":0,\"hit_count\":0,\"miss_count\":0,\"evictions\":0},\"fielddata\":{\"memory_size_in_bytes\":0,\"evictions\":0},\"segments\":{\"count\":0,\"memory_in_bytes\":0,\"terms_memory_in_bytes\":0,\"stored_fields_memory_in_bytes\":0,\"term_vectors_memory_in_bytes\":0,\"norms_memory_in_bytes\":0,\"points_memory_in_bytes\":0,\"doc_values_memory_in_bytes\":0,\"index_writer_memory_in_bytes\":0,\"version_map_memory_in_bytes\":0,\"fixed_bit_set_memory_in_bytes\":0},\"request_cache\":{\"memory_size_in_bytes\":0,\"evictions\":0,\"hit_count\":0,\"miss_count\":0}},\"os\":{\"cpu\":{\"load_average\":{\"1m\":7.72,\"5m\":8.81,\"15m\":7.97}}},\"process\":{\"open_file_descriptors\":1502,\"max_file_descriptors\":65536,\"cpu\":{\"percent\":0}},\"jvm\":{\"mem\":{\"heap_used_in_bytes\":1823671544,\"heap_used_percent\":7,\"heap_max_in_bytes\":25525551104},\"gc\":{\"collectors\":{\"young\":{\"collection_count\":9,\"collection_time_in_millis\":753},\"old\":{\"collection_count\":1,\"collection_time_in_millis\":651}}}},\"thread_pool\":{\"generic\":{\"threads\":66,\"queue\":0,\"rejected\":0},\"get\":{\"threads\":0,\"queue\":0,\"rejected\":0},\"management\":{\"threads\":5,\"queue\":0,\"rejected\":0},\"search\":{\"threads\":0,\"queue\":0,\"rejected\":0},\"watcher\":{\"threads\":0,\"queue\":0,\"rejected\":0},\"write\":{\"threads\":0,\"queue\":0,\"rejected\":0}},\"fs\":{\"total\":{\"total_in_bytes\":53660876800,\"free_in_bytes\":24523497472,\"available_in_bytes\":24523497472},\"io_stats\":{\"total\":{\"operations\":37738,\"read_operations\":95,\"write_operations\":37643,\"read_kilobytes\":1208,\"write_kilobytes\":543691}}}}}]}]]]\n","stream":"stdout","time":"2019-08-12T04:15:00.000344459Z"}
{"log":"\u0009at org.elasticsearch.xpack.monitoring.exporter.local.LocalBulk.lambda$throwExportException$2(LocalBulk.java:125) ~[x-pack-monitoring-7.0.1.jar:7.0.1]\n","stream":"stdout","time":"2019-08-12T04:15:00.000387048Z"}