we are using Elasticsearch with Kibana and APM-Server for monitoring since two years. Since 3 weeks we are experiencing massive problems. First Kibana stopped due to a corrupted translog (TranslogCorruptedException) in index .kibana_task_manager_7.17.1_001. For this I used
and a subsequent _cluster/reroute. After that Kibana complained about the missing shard so I deleted and recreated the index. After this Kibana worked again. But the problem with the APM-Server remained: it cannot send data to Elasticsearch. The error message is not really helpful:
failed to publish events: temporary bulk send failure. Kibana and Elasticsearch are version 7.17, the APM-Server is 7.15. Since it worked without a problem for a while the different versions might not be a problem. One thing that is strange is this line in the Elasticsearch log file:
[o.e.c.m.MetadataIndexTemplateService] [node-1] adding template [apm-7.15.0] for index patterns [apm-7.15.0*]
This is repeated EVERY minute. When doing
GET /_index_template/apm-7.15.0 than an error is returned that no such template exists.
The current cluster status is red, which may be due to several unassigned primary shards (also due to corrupted):
"failed shard on node [_HRONvKmRNmrRUGdZUG1uQ]: shard failure, reason [failed to recover from translog], failure EngineException[failed to recover from translog]; nested: TranslogCorruptedException[translog from source [.../elasticsearch/nodes/0/indices/iX6Wk--TTji3HYCe0qBfrw/0/translog/translog-50.tlog] is corrupted, translog truncated]; nested: EOFException[read past EOF. pos  length:  end: ]; ",
Hope someone can help. Thanks!