Td-agent logs stopped working

Hi All,

Need a guideline for below mentioned issue.

In our environment, we are using Elasticsearch to store web logs and we also use td-agent to pass data to elasticsearch. Elasticsearch version is 7.2.0, Kibana version is 7.2.0 and td-agent version is 3.4.0

We got below mentioned errors in our error logs in td-agent

  1. error like

2020-02-17 22:42:45 -0500 [warn]: #0 failed to flush the buffer. retry_time=16 next_retry_seconds=2020-02-18 07:35:31 -0500 chunk="59ec941582a3197510c332bcbe2a101f" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>"node1", :port=>9200, :scheme=>"http"}): Connection refused - connect(2) for 192.168.1.54:80 (Errno::ECONNREFUSED)"

2020-02-17 22:42:45 -0500 [warn]: #0 suppressed same stacktrace

  1. Error Like

2020-02-05 00:39:50 -0500 [warn]: #0 failed to flush the buffer. retry_time=4 next_retry_seconds=2020-02-05 00:39:58 -0500 chunk="59dcd91acf1e2675fd7b28dd88617b7b" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>" node1 ", :port=>9200, :scheme=>"http"}): getaddrinfo: Temporary failure in name resolution (SocketError)"

2020-02-05 00:39:50 -0500 [warn]: #0 suppressed same stacktrace

2020-02-05 00:39:04 -0500 [warn]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.4.2/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'

  1. Error Like

2020-02-04 02:16:22 -0500 [error]: #0 [input_forward] unexpected error on reading data host="192.168.1.53" port=51023 error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data"

2020-02-04 02:16:22 -0500 [warn]: #0 emit transaction failed: error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" location="/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.4.2/lib/fluent/plugin/buffer.rb:298:in `write'" tag="ApplicationLog.Information"

2020-02-04 02:26:52 -0500 [warn]: #0 failed to write data into buffer by buffer overflow action=:throw_exception

  1. Error Like

2020-02-18 02:20:42 -0500 [warn]: #0 failed to flush the buffer. retry_time=0 next_retry_seconds=2020-02-18 02:20:43 -0500 chunk="59ed481cac18e78fa18d726eee8e95e6" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>"node1", :port=>9200, :scheme=>"http"}): Connection refused - connect(2) for 192.168.1.54:80 (Errno::ECONNREFUSED)"
2020-02-18 02:20:42 -0500 [warn]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluent-plugin-elasticsearch-3.5.1/lib/fluent/plugin/out_elasticsearch_dynamic.rb:220:in rescue in send_bulk' 2020-02-18 02:20:42 -0500 [warn]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluent-plugin-elasticsearch-3.5.1/lib/fluent/plugin/out_elasticsearch_dynamic.rb:211:in send_bulk'
2020-02-18 02:20:42 -0500 [warn]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluent-plugin-elasticsearch-3.5.1/lib/fluent/plugin/out_elasticsearch_dynamic.rb:206:in block in write' 2020-02-18 02:20:42 -0500 [warn]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluent-plugin-elasticsearch-3.5.1/lib/fluent/plugin/out_elasticsearch_dynamic.rb:205:in each'
2020-02-18 02:20:42 -0500 [warn]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluent-plugin-elasticsearch-3.5.1/lib/fluent/plugin/out_elasticsearch_dynamic.rb:205:in write' 2020-02-18 02:20:42 -0500 [warn]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.4.2/lib/fluent/plugin/output.rb:1125:in try_flush'
2020-02-18 02:20:42 -0500 [warn]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.4.2/lib/fluent/plugin/output.rb:1425:in flush_thread_run' 2020-02-18 02:20:42 -0500 [warn]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.4.2/lib/fluent/plugin/output.rb:454:in block (2 levels) in start'
2020-02-18 02:20:42 -0500 [warn]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.4.2/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2020-02-18 02:20:43 -0500 [warn]: #0 failed to flush the buffer. retry_time=1 next_retry_seconds=2020-02-18 02:20:43 -0500 chunk="59ed481cac18e78fa18d726eee8e95e6" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>"node1", :port=>9200, :scheme=>"http"}): Connection refused - connect(2) for 192.168.1.54:80 (Errno::ECONNREFUSED)"
2020-02-18 02:20:43 -0500 [warn]: #0 suppressed same stacktrace
2020-02-18 02:20:43 -0500 [warn]: #0 failed to flush the buffer. retry_time=2 next_retry_seconds=2020-02-18 02:20:46 -0500 chunk="59ed481cac18e78fa18d726eee8e95e6" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>"node1", :port=>9200, :scheme=>"http"}): Connection refused - connect(2) for 192.168.1.54:80 (Errno::ECONNREFUSED)"
2020-02-18 02:20:43 -0500 [warn]: #0 suppressed same stacktrace

In our case when we restart service of td-agent all things again work properly but till time logs are not generating.

I want some guideline that, is there anything we need to configure from Elasticsearch side to avoid this error? What should be the root cause for this error?

Can anybody please help me as I have performed below changes from td-agent side but it is not working.

reload_connections false
reconnect_on_error true
reload_on_failure true

flush_interval 10s
flush_thread_count 4

Also suggest that this settings are required on td-agent and how can I optimize it..Please help.

Thanks.

It seems to me that the agent software (which I do not know) does not properly recover if the Elasticsearch node is not reachable temporarily. The ECONNREFUSED error indicates, that either Elasticsearch is down or some firewall is rejecting a connection. If that event is taken as a measure to never try again that would be a wrong approach (I am just guessing here).

Maybe ask in the td-agent forums/googlegroups/mailinglist for more information if noone is answering here.

Hi Spinscale,

Thanks for reply. I have checked at the same time, the elasticsearch health status is green and it is working properly. Is there any possibility that chunk size sent by td-agent is larger than elasticsearch supports at a time, so elasticsearch refuses connection. can I know the largest request at a time supported by elasticsearch. My elasticsearch server has 16GB memory and in JVM I have given -Xms8g -Xmx8g.

Thanks.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.