Questions about retry and store data in a file

Hi, all

The Metricbeat set output to Elasticsearch, in our system. But the DNS error or other network errors will let the data to be dropped.
We want to protect the collection data.

Can we output the data to a file or a local Redis server when the retrying is out of the maximum and the data will be dropped? And how?

Another Question is what is the interval of the retry. The next retrying will be call immediately after the current event post failed? If not, I can change it in the YAML configuration?

Thank you

We plan on adding a feature in 6.x to allow the spooling of the events to disk to cover this case.

You could output the events to a queue system like redis or kafka, but you'll need Logstash to read them back out and put them into ES. Now that Logstash has its own persistent queue you could send the events directly to Logstash and this would give you a buffer between Beats and Elasticsearch.

I believe that the retry interval uses an exponential backoff that is capped. Probably @steffens knows precisely that answer and the config options.

@andrewkroh Thank for your answer.

We want to set "output.elasticsearch.max_retries: -1" to resolve this problem. But I don't know whether some risks come with the setting?

And maybe @steffens can help us to clear the interval?

Thank you

This should make Metricbeat behave like Filebeat or Winlogbeat where it does not drop data. The downside is that it will stop collecting new data once the internal publishing queue fills up.

1 Like

Hi, @andrewkroh

How much the "internal publishing queue" size? The size is set by "bulk_max_size"?
Where can I find the document about it, except for reading the source code :slight_smile:

Thank you

See https://www.elastic.co/guide/en/beats/metricbeat/current/configuration-general.html#_queue_size

Thank you very much. :heart:

Setting max_retries: -1 enables infinite retry. The problem with infinite retry is, the queues in metricbeat can fill up. Once the queue is full, metricbeat modules/metricsets can not publish any new events and therefore will not be executed. That is, You can only opt if you loose data at the beginning or end of your network outage. Spooling to disk in metricbeat will help in the future, but still, queues will be limited in size.

In 6.0 rc releases, you can configure the backoff limits by configuring output.logstash.backoff.init (default 1s) and output.logstash.backoff.max (default 60s).

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.