Observations
-
OS: Windows Sever 2019
-
Observations are based on:
Metricbeat Version 7.15.0 (amd64), libbeat 7.15.0 [9023152025ec6251bc6b6c38009b309157f10f17 built 2021-09-16 03:28:25 +0000 UTC]
-
When MetricBeat Agent encountered an error on the Elastic Node, it reports the error to the Monitoring Cluster.
-
The message is stored in the index “metribeat-7.15.0”. (Note: Base on the mapping, it is not making use of the default metricbeat template)
-
When the agent created a new index, it encounters the following errors (as captured in the log file)
ERROR [publisher_pipeline_output] pipeline/output.go:154 Failed to connect to backoff(elasticsearch(http:// xx.xx.xx.xx:9200)): Connection marked as failed because the onConnect callback failed: resource 'metricbeat-7.15.0' exists, but it is not an alias
INFO [publisher_pipeline_output] pipeline/output.go:145 Attempting to reconnect to backoff(elasticsearch(http:// xx.xx.xx.xx:9200)) with X reconnect attempt(s) -
The Monitor cluster stops receiving metrics from the agent.
Verifications: Metrics are store in indices “.monitoring-es-7-mb-%{+yyyy.MM.dd}”. No new indices are created.
-
Delete the index “metribeat-7.15.0” , (Restart Metricbeat Agent) and Monitor Cluster will resume collecting metrics.
Verifications: New indices indices “.monitoring-es-7-mb-%{+yyyy.MM.dd}” created.
Hypothesis
-
When Metricbeat setup the default index template. The default index lifecycle write alias name is “metricbeat-%{[agent.version]}" = “metricbeat-7.15.0".
-
This alias crashes with the index created when Metricbeat encounters an error. This stops the agent from writing to the Monitoring Cluster.
-
It is likely that when the agent was writing the error, the index name wasn’t specify properly (i.e. the date math portion may be missing), it should be metribeat-%{[agent.version]}-%{+yyyy.MM.dd} instead of metribeat-%{[agent.version]}
Questions
-
Is the hypothesis correct in stating that a default configuration somewhere (minor bug) needs to be updated?
-
As a workaround, is there somewhere I can specify the index name for the agent to use when writing error messages?
-
Finally, I may be completely wrong, what may be the other causes or explanation?
Thanks for any helps or feedbacks