Elastic cluster issue!

We are seeing CPU spikes for our Elastic cluster and in logs we see this error continuously. We also get a circuit breaker exception in kibana when CPU spikes up. What could be wrong with our cluster

[2022-09-22T13:40:32,300][WARN ][r.suppressed ] [] path: /_template/, params: {name=}
org.elasticsearch.transport.RemoteTransportException: [
][****:9300][indices:admin/template/put]
Caused by: org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (create-index-template [****], cause [api]) within 30s
at org.elasticsearch.cluster.service.MasterService$Batcher.lambda$onTimeout$0(MasterService.java:132) ~[elasticsearch-7.14.1.jar:7.14.1]
at java.util.ArrayList.forEach(ArrayList.java:1511) ~[?:?]
at org.elasticsearch.cluster.service.MasterService$Batcher.lambda$onTimeout$1(MasterService.java:131) ~[elasticsearch-7.14.1.jar:7.14.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:673) ~[elasticsearch-7.14.1.jar:7.14.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) ~[?:?]
at java.lang.Thread.run(Thread.java:831) [?:?]

What is the output from the _cluster/stats?pretty&human API?

It seems it takes a long time to update and propagate changes to the cluster state. I have previously seen this happen due to the following (one or more may apply, but there could be other reasons as well):

  • The cluster is overloaded.
  • The cluster state is very large, e.g. due to large mappings and/or very large number of indices and shards.
  • The storage used is very slow.
  • Master eligible nodes are deployed on hardware that have CPU credits that can run out and limit processing.
  • There is long or frequent GC that delay operations.
  • Frequent cluster state updates, e.g. due to constantly adding new fields with dynamic mappings.
  • Too many configured master eligible nodes.
  • Poor or unreliable network.
1 Like
> {
>   "_nodes" : {
>     "total" : 3,
>     "successful" : 3,
>     "failed" : 0
>   },
>   "cluster_name" : "prod",
>   "cluster_uuid" : "tBg5ZA3rTV66DcM_GzR4zA",
>   "timestamp" : 1664177511760,
>   "status" : "green",
>   "indices" : {
>     "count" : 711,
>     "shards" : {
>       "total" : 1422,
>       "primaries" : 711,
>       "replication" : 1.0,
>       "index" : {
>         "shards" : {
>           "min" : 2,
>           "max" : 2,
>           "avg" : 2.0
>         },
>         "primaries" : {
>           "min" : 1,
>           "max" : 1,
>           "avg" : 1.0
>         },
>         "replication" : {
>           "min" : 1.0,
>           "max" : 1.0,
>           "avg" : 1.0
>         }
>       }
>     },
>     "docs" : {
>       "count" : 1158691482,
>       "deleted" : 533493
>     },
>     "store" : {
>       "size" : "696.6gb",
>       "size_in_bytes" : 747972142321,
>       "total_data_set_size" : "696.6gb",
>       "total_data_set_size_in_bytes" : 747972142321,
>       "reserved" : "0b",
>       "reserved_in_bytes" : 0
>     },
>     "fielddata" : {
>       "memory_size" : "14.3mb",
>       "memory_size_in_bytes" : 15021736,
>       "evictions" : 0
>     },
>     "query_cache" : {
>       "memory_size" : "175.6mb",
>       "memory_size_in_bytes" : 184182259,
>       "total_count" : 82433087,
>       "hit_count" : 1116561,
>       "miss_count" : 81316526,
>       "cache_size" : 15016,
>       "cache_count" : 117462,
>       "evictions" : 102446
>     },
>     "completion" : {
>       "size" : "0b",
>       "size_in_bytes" : 0
>     },
>     "segments" : {
>       "count" : 17805,
>       "memory" : "204.4mb",
>       "memory_in_bytes" : 214386436,
>       "terms_memory" : "166mb",
>       "terms_memory_in_bytes" : 174139456,
>       "stored_fields_memory" : "9mb",
>       "stored_fields_memory_in_bytes" : 9535064,
>       "term_vectors_memory" : "0b",
>       "term_vectors_memory_in_bytes" : 0,
>       "norms_memory" : "2mb",
>       "norms_memory_in_bytes" : 2143552,
>       "points_memory" : "0b",
>       "points_memory_in_bytes" : 0,
>       "doc_values_memory" : "27.2mb",
>       "doc_values_memory_in_bytes" : 28568364,
>       "index_writer_memory" : "393.2mb",
>       "index_writer_memory_in_bytes" : 412364384,
>       "version_map_memory" : "326b",
>       "version_map_memory_in_bytes" : 326,
>       "fixed_bit_set" : "142.5mb",
>       "fixed_bit_set_memory_in_bytes" : 149481176,
>       "max_unsafe_auto_id_timestamp" : 1664174858091,
>       "file_sizes" : { }
>     },
>     "mappings" : {
>       "field_types" : [
>         {
>           "name" : "alias",
>           "count" : 20598,
>           "index_count" : 613,
>           "script_count" : 0
>         },
>         {
>           "name" : "binary",
>           "count" : 1,
>           "index_count" : 1,
>           "script_count" : 0
>         },
>         {
>           "name" : "boolean",
>           "count" : 79832,
>           "index_count" : 619,
>           "script_count" : 0
>         },
>         {
>           "name" : "byte",
>           "count" : 613,
>           "index_count" : 613,
>           "script_count" : 0
>         },
>         {
>           "name" : "constant_keyword",
>           "count" : 1838,
>           "index_count" : 613,
>           "script_count" : 0
>         },
>         {
>           "name" : "date",
>           "count" : 100438,
>           "index_count" : 670,
>           "script_count" : 0
>         },
>         {
>           "name" : "date_nanos",
>           "count" : 1,
>           "index_count" : 1,
>           "script_count" : 0
>         },
>         {
>           "name" : "date_range",
>           "count" : 2,
>           "index_count" : 2,
>           "script_count" : 0
>         },
>         {
>           "name" : "double",
>           "count" : 21129,
>           "index_count" : 573,
>           "script_count" : 0
>         },
>         {
>           "name" : "double_range",
>           "count" : 1,
>           "index_count" : 1,
>           "script_count" : 0
>         },
>         {
>           "name" : "flattened",
>           "count" : 17130,
>           "index_count" : 571,
>           "script_count" : 0
>         },
>         {
>           "name" : "float",
>           "count" : 20684,
>           "index_count" : 617,
>           "script_count" : 0
>         },
>         {
>           "name" : "float_range",
>           "count" : 1,
>           "index_count" : 1,
>           "script_count" : 0
>         },
>         {
>           "name" : "geo_point",
>           "count" : 5428,
>           "index_count" : 614,
>           "script_count" : 0
>         },
>         {
>           "name" : "geo_shape",
>           "count" : 1,
>           "index_count" : 1,
>           "script_count" : 0
>         },
>         {
>           "name" : "half_float",
>           "count" : 1,
>           "index_count" : 1,
>           "script_count" : 0
>         },
>         {
>           "name" : "histogram",
>           "count" : 41,
>           "index_count" : 41,
>           "script_count" : 0
>         },
>         {
>           "name" : "integer",
>           "count" : 1,
>           "index_count" : 1,
>           "script_count" : 0
>         },
>         {
>           "name" : "integer_range",
>           "count" : 1,
>           "index_count" : 1,
>           "script_count" : 0
>         },
>         {
>           "name" : "ip",
>           "count" : 76521,
>           "index_count" : 615,
>           "script_count" : 0
>         },
>         {
>           "name" : "ip_range",
>           "count" : 573,
>           "index_count" : 573,
>           "script_count" : 0
>         },
>         {
>           "name" : "keyword",
>           "count" : 2457129,
>           "index_count" : 670,
>           "script_count" : 0
>         },
>         {
>           "name" : "long",
>           "count" : 597097,
>           "index_count" : 668,
>           "script_count" : 0
>         },
>         {
>           "name" : "long_range",
>           "count" : 1,
>           "index_count" : 1,
>           "script_count" : 0
>         },
>         {
>           "name" : "nested",
>           "count" : 1717,
>           "index_count" : 575,
>           "script_count" : 0
>         },
>         {
>           "name" : "object",
>           "count" : 499622,
>           "index_count" : 669,
>           "script_count" : 0
>         },
>         {
>           "name" : "scaled_float",
>           "count" : 971,
>           "index_count" : 612,
>           "script_count" : 0
>         },
>         {
>           "name" : "shape",
>           "count" : 1,
>           "index_count" : 1,
>           "script_count" : 0
>         },
>         {
>           "name" : "short",
>           "count" : 58814,
>           "index_count" : 572,
>           "script_count" : 0
>         },
>         {
>           "name" : "text",
>           "count" : 64528,
>           "index_count" : 669,
>           "script_count" : 0
>         },
>         {
>           "name" : "wildcard",
>           "count" : 571,
>           "index_count" : 571,
>           "script_count" : 0
>         }
>       ],
>       "runtime_field_types" : [ ]
>     },
>     "analysis" : {
>       "char_filter_types" : [ ],
>       "tokenizer_types" : [ ],
>       "filter_types" : [ ],
>       "analyzer_types" : [ ],
>       "built_in_char_filters" : [ ],
>       "built_in_tokenizers" : [ ],
>       "built_in_filters" : [ ],
>       "built_in_analyzers" : [ ]
>     },
>     "versions" : [
>       {
>         "version" : "7.14.1",
>         "index_count" : 711,
>         "primary_shard_count" : 711,
>         "total_primary_size" : "348.2gb",
>         "total_primary_bytes" : 373980043773
>       }
>     ]
>   },
>   "nodes" : {
>     "count" : {
>       "total" : 3,
>       "coordinating_only" : 0,
>       "data" : 3,
>       "data_cold" : 3,
>       "data_content" : 3,
>       "data_frozen" : 3,
>       "data_hot" : 3,
>       "data_warm" : 3,
>       "ingest" : 3,
>       "master" : 3,
>       "ml" : 3,
>       "remote_cluster_client" : 3,
>       "transform" : 3,
>       "voting_only" : 0
>     },
>     "versions" : [
>       "7.14.1"
>     ],
>     "os" : {
>       "available_processors" : 12,
>       "allocated_processors" : 12,
>       "names" : [
>         {
>           "name" : "Linux",
>           "count" : 3
>         }
>       ],
>       "pretty_names" : [
>         {
>           "pretty_name" : "CentOS Linux 7 (Core)",
>           "count" : 3
>         }
>       ],
>       "architectures" : [
>         {
>           "arch" : "amd64",
>           "count" : 3
>         }
>       ],
>       "mem" : {
>         "total" : "45.6gb",
>         "total_in_bytes" : 49038901248,
>         "free" : "1.7gb",
>         "free_in_bytes" : 1862979584,
>         "used" : "43.9gb",
>         "used_in_bytes" : 47175921664,
>         "free_percent" : 4,
>         "used_percent" : 96
>       }
>     },
>     "process" : {
>       "cpu" : {
>         "percent" : 141
>       },
>       "open_file_descriptors" : {
>         "min" : 6032,
>         "max" : 6478,
>         "avg" : 6262
>       }
>     },
>     "jvm" : {
>       "max_uptime" : "5.7d",
>       "max_uptime_in_millis" : 500197742,
>       "versions" : [
>         {
>           "version" : "16.0.2",
>           "vm_name" : "OpenJDK 64-Bit Server VM",
>           "vm_version" : "16.0.2+7",
>           "vm_vendor" : "Eclipse Foundation",
>           "bundled_jdk" : true,
>           "using_bundled_jdk" : true,
>           "count" : 3
>         }
>       ],
>       "mem" : {
>         "heap_used" : "17.8gb",
>         "heap_used_in_bytes" : 19214277096,
>         "heap_max" : "24gb",
>         "heap_max_in_bytes" : 25769803776
>       },
>       "threads" : 540
>     },
>     "fs" : {
>       "total" : "1.4tb",
>       "total_in_bytes" : 1610088448000,
>       "free" : "711.5gb",
>       "free_in_bytes" : 763994914816,
>       "available" : "711.5gb",
>       "available_in_bytes" : 763994914816
>     },
>     "plugins" : [ ],
>     "network_types" : {
>       "transport_types" : {
>         "security4" : 3
>       },
>       "http_types" : {
>         "security4" : 3
>       }
>     },
>     "discovery_types" : {
>       "zen" : 3
>     },
>     "packaging_types" : [
>       {
>         "flavor" : "default",
>         "type" : "rpm",
>         "count" : 3
>       }
>     ],
>     "ingest" : {
>       "number_of_pipelines" : 28,
>       "processor_stats" : {
>         "append" : {
>           "count" : 91786,
>           "failed" : 0,
>           "current" : 0,
>           "time" : "197ms",
>           "time_in_millis" : 197
>         },
>         "conditional" : {
>           "count" : 528951659,
>           "failed" : 0,
>           "current" : 0,
>           "time" : "2.1h",
>           "time_in_millis" : 7675194
>         },
>         "convert" : {
>           "count" : 0,
>           "failed" : 0,
>           "current" : 0,
>           "time" : "0s",
>           "time_in_millis" : 0
>         },
>         "date" : {
>           "count" : 45893,
>           "failed" : 0,
>           "current" : 0,
>           "time" : "4.3s",
>           "time_in_millis" : 4384
>         },
>         "geoip" : {
>           "count" : 429425983,
>           "failed" : 0,
>           "current" : 0,
>           "time" : "1.4h",
>           "time_in_millis" : 5045315
>         },
>         "grok" : {
>           "count" : 45893,
>           "failed" : 0,
>           "current" : 0,
>           "time" : "974ms",
>           "time_in_millis" : 974
>         },
>         "gsub" : {
>           "count" : 0,
>           "failed" : 0,
>           "current" : 0,
>           "time" : "0s",
>           "time_in_millis" : 0
>         },
>         "pipeline" : {
>           "count" : 1717703932,
>           "failed" : 0,
>           "current" : 0,
>           "time" : "6.6h",
>           "time_in_millis" : 23817209
>         },
>         "remove" : {
>           "count" : 45893,
>           "failed" : 0,
>           "current" : 0,
>           "time" : "173ms",
>           "time_in_millis" : 173
>         },
>         "rename" : {
>           "count" : 45893,
>           "failed" : 0,
>           "current" : 0,
>           "time" : "276ms",
>           "time_in_millis" : 276
>         },
>         "script" : {
>           "count" : 91786,
>           "failed" : 0,
>           "current" : 0,
>           "time" : "1.7s",
>           "time_in_millis" : 1759
>         },
>         "set" : {
>           "count" : 91786,
>           "failed" : 0,
>           "current" : 0,
>           "time" : "1.4s",
>           "time_in_millis" : 1478
>         },
>         "set_security_user" : {
>           "count" : 0,
>           "failed" : 0,
>           "current" : 0,
>           "time" : "0s",
>           "time_in_millis" : 0
>         },
>         "user_agent" : {
>           "count" : 429425983,
>           "failed" : 0,
>           "current" : 0,
>           "time" : "2h",
>           "time_in_millis" : 7522535
>         }
>       }
>     }
>   }
> }

Please format your code/logs/config using the </> button, or markdown style back ticks. It helps to make things easy to read which helps us help you :slight_smile:

I have attached the cluster Stats API output, We have hosted it on m5.xlarge instance (3 instances). We have made all 3 nodes to be master eligible to achieve HA. We see CPU spikes sometimes, upon checking logs during that interval we see it goes into GC during that time, but we also see index template getting created all the time.

Made the edit, Sorry for the inconvenience

1 Like

Are you using dynamic mappings? It looks like you have a lot of different fields given the number of indices you have. If mappings are growing over time, each change will require a change to the cluster state and this could be what is causing the problem. It could also be that this is made worse by the cluster being under load or having slow storage.

I would recommend that you have a look at your mappings and verify that you do not have field names that are generated dynamically, e.g. by including dunamic data like dates or IP addresses.

As far i remember we don't use dynamic fields, we use APM whose index size are of huge size and we have ILM placed for those indices. Could that be a problem for the cluster performance. Storage wise it won't be an issues since we have 500GB for each node and we don't store more than 15days of data

Could you help me know how i can identify whether we are using dynamic fields or template for a index?

If you are seeing frequent or long GC it is possible you need to increase the size of the cluster and the heap available to the nodes.

What do you mean by this? Can you share some logs?

Look at the mappings for the indices and see what fields you have in there. You can maybe also take a sample of the mappings a some point and get another copy some time later and see what (if any) has changed.

[2022-09-26T06:10:40,817][INFO ][o.e.c.m.MetadataIndexTemplateService] [xxx] adding template [xxx-xx-ha1] for index patterns [xx-xx-ha1*]

[2022-09-26T06:10:42,187][INFO ][o.e.c.m.MetadataIndexTemplateService] [xxx] adding template [xxx for index patterns [xx*]

[2022-09-26T06:10:43,719][INFO ][o.e.c.m.MetadataIndexTemplateService] [xxxx] adding template [xxx] for index patterns [xxx]

[2022-09-26T06:10:45,141][INFO ][o.e.c.m.MetadataIndexTemplateService] [xxx] adding template [xxxx] for index patterns [xxxx*]

[2022-09-26T06:10:46,753][INFO ][o.e.c.m.MetadataIndexTemplateService] [xxxx] adding template [xxxx] for index patterns [xxxx*]

[2022-09-26T06:10:47,975][INFO ][o.e.c.m.MetadataIndexTemplateService] [xxxx] adding template [xxx] for index patterns [xxxx]

What are these index templates being added? Each of these will also require the cluster state to be updated. How many index templates do you have if you run the index template API?

we use separate index template for each server so it will around 30-35 index templates. What would. be your suggestion wrt index template?

Why?

How frequently is it updated?

Just to keep server things seperate we create an index template. It is not updated frequently

[
  {
    "strings_as_keyword": {
      "mapping": {
        "ignore_above": 1024,
        "type": "keyword"
      },
      "match_mapping_type": "string"
    }
  }
]

we are using this dynamic template for all indices

In one statement you say index templates are getting created all the time and in the other that it does not happen often. Which is it?

Please share more of the logs so we can better see what is going on. If you are seeing frequent or long GC that could also be a cause. Maybe you need to increase the size of your nodes to get more heap?

We see this in the logs more frequent-

[2022-10-02T12:11:55,754][WARN ][o.e.m.j.JvmGcMonitorService] [xxxx] [gc][346971] overhead, spent [4.9s] collecting in the last [5.5s]
[2022-10-02T12:11:58,618][INFO ][o.e.c.m.MetadataIndexTemplateService] [xxxx] adding template [xxxx12] for index patterns [xxxx12*]
[2022-10-02T12:12:00,703][INFO ][o.e.c.m.MetadataIndexTemplateService] [xxxx] adding template [xxxx22] for index patterns [xxxx22*]
[2022-10-02T12:12:02,707][INFO ][o.e.c.m.MetadataIndexTemplateService] [xxxx] adding template [xxxx11] for index patterns [xxxx11*]
[2022-10-02T12:12:04,805][INFO ][o.e.c.m.MetadataIndexTemplateService] [xxxx] adding template [xxxx23] for index patterns [xxxx23*]
[2022-10-02T12:12:06,884][INFO ][o.e.c.m.MetadataIndexTemplateService] [xxxx] adding template [xxxx17] for index patterns [xxxx17*]
[2022-10-02T12:12:08,561][INFO ][o.e.c.m.MetadataIndexTemplateService] [xxxx] adding template [xxxx18] for index patterns [xxxx18*]
[2022-10-02T12:12:10,154][INFO ][o.e.c.m.MetadataIndexTemplateService] [xxxx] adding template [xxxx14] for index patterns [xxxx14*]
[2022-10-02T12:12:11,601][INFO ][o.e.c.m.MetadataIndexTemplateService] [xxxx] adding template [xxxx13] for index patterns [xxxx13*]
[2022-10-02T12:12:12,928][INFO ][o.e.c.m.MetadataIndexTemplateService] [xxxx] adding template [xxxx34] for index patterns [xxxx34*]
[2022-10-02T12:12:14,496][INFO ][o.e.c.m.MetadataIndexTemplateService] [xxxx] adding template [xxxx15] for index patterns [xxxx15*]
[2022-10-02T12:12:15,534][INFO ][o.e.c.m.MetadataIndexTemplateService] [xxxx] adding template [xxxx11] for index patterns [xxxx11*]
[2022-10-02T12:12:17,073][INFO ][o.e.c.m.MetadataIndexTemplateService] [xxxx] adding template [xxxx11] for index patterns [xxxx11*]
[2022-10-02T12:12:18,702][INFO ][o.e.c.m.MetadataIndexTemplateService] [xxxx] adding template [xxxx12] for index patterns [xxxx12*]
[2022-10-02T12:12:19,972][INFO ][o.e.c.m.MetadataIndexTemplateService] [xxxx] adding template [xxxx16] for index patterns [xxxx16*]
[2022-10-02T12:12:21,370][INFO ][o.e.c.m.MetadataIndexTemplateService] [xxxx] adding template [xxxx11] for index patterns [xxxx11*]
[2022-10-02T12:12:22,816][INFO ][o.e.c.m.MetadataIndexTemplateService] [xxxx] adding template [xxxx23] for index patterns [xxxx23*]
[2022-10-02T12:12:23,946][INFO ][o.e.c.m.MetadataIndexTemplateService] [xxxx] adding template [xxxx14] for index patterns [xxxx14*]
[2022-10-02T12:12:25,271][INFO ][o.e.c.m.MetadataIndexTemplateService] [xxxx] adding template [xxxx13] for index patterns [xxxx13*]
[2022-10-02T12:12:26,450][INFO ][o.e.c.m.MetadataIndexTemplateService] [xxxx] adding template [xxxx15] for index patterns [xxxx15*]
[2022-10-02T12:12:27,756][INFO ][o.e.c.m.MetadataIndexTemplateService] [xxxx] adding template [xxxx11] for index patterns [xxxx11*]
[2022-10-02T12:12:29,063][INFO ][o.e.c.m.MetadataIndexTemplateService] [xxxx] adding template [xxxx11] for index patterns [xxxx11*]
[2022-10-02T12:12:30,155][INFO ][o.e.c.m.MetadataIndexTemplateService] [xxxx] adding template [xxxx12] for index patterns [xxxx12*]
[2022-10-02T12:12:31,406][INFO ][o.e.c.m.MetadataIndexTemplateService] [xxxx] adding template [xxxx16] for index patterns [xxxx16*]
[2022-10-02T12:12:32,809][INFO ][o.e.c.m.MetadataIndexTemplateService] [xxxx] adding template [xxxx11] for index patterns [xxxx11*]
[2022-10-02T12:12:34,028][INFO ][o.e.c.m.MetadataIndexTemplateService] [xxxx] adding template [xxxx23] for index patterns [xxxx23*]
[2022-10-02T12:12:34,955][INFO ][o.e.c.m.MetadataIndexTemplateService] [xxxx] adding template [xxxx14] for index patterns [xxxx14*]
[2022-10-02T12:12:36,123][INFO ][o.e.c.m.MetadataIndexTemplateService] [xxxx] adding template [xxxx13] for index patterns [xxxx13*]
[2022-10-02T12:12:37,139][INFO ][o.e.c.m.MetadataIndexTemplateService] [xxxx] adding template [xxxx15] for index patterns [xxxx15*]
[2022-10-02T12:12:38,186][INFO ][o.e.c.m.MetadataIndexTemplateService] [xxxx] adding template [xxxx11] for index patterns [xxxx11*]
[2022-10-02T12:12:39,563][INFO ][o.e.c.m.MetadataIndexTemplateService] [xxxx] adding template [xxxx11] for index patterns [xxxx11*]
[2022-10-02T12:12:40,773][INFO ][o.e.c.m.MetadataIndexTemplateService] [xxxx] adding template [xxxx12] for index patterns [xxxx12*]
[2022-10-02T12:12:41,925][INFO ][o.e.c.m.MetadataIndexTemplateService] [xxxx] adding template [xxxx16] for index patterns [xxxx16*]```

[2022-10-02T11:14:40,099][WARN ][r.suppressed             ] [xxxx] path: /_template/xxxx34, params: {name=xxxx34}
org.elasticsearch.transport.RemoteTransportException: [elk.app.engati.local][10.10.1.185:9300][indices:admin/template/put]
Caused by: org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (create-index-template [xxxx34], cause [api]) within 30s
        at org.elasticsearch.cluster.service.MasterService$Batcher.lambda$onTimeout$0(MasterService.java:132) ~[elasticsearch-7.14.1.jar:7.14.1]
        at java.util.ArrayList.forEach(ArrayList.java:1511) ~[?:?]
        at org.elasticsearch.cluster.service.MasterService$Batcher.lambda$onTimeout$1(MasterService.java:131) ~[elasticsearch-7.14.1.jar:7.14.1]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:673) ~[elasticsearch-7.14.1.jar:7.14.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) ~[?:?]
        at java.lang.Thread.run(Thread.java:831) [?:?]```

We have this config in our filebeat, could this be the reason for these continuous logs in cluster
setup.template.overwrite: true

That is possible. Try to disable it and see if that has any effect.

The first log line indicates that you may need to increase the amount of heap and scale up the nodes.

will try to change the filebeat config and see if it has any effect. We have 3node cluster (16GB memory Box) with 8GB heap assigned, should we increase the heap to more than 50% of the whole memory and is there any impact when we increase heap above half of the physical RAM??