Td-agent logs stopped working

krushnakantB · February 19, 2020, 7:08am

Hi All,

Need a guideline for below mentioned issue.

In our environment, we are using Elasticsearch to store web logs and we also use td-agent to pass data to elasticsearch. Elasticsearch version is 7.2.0, Kibana version is 7.2.0 and td-agent version is 3.4.0

We got below mentioned errors in our error logs in td-agent

error like

2020-02-17 22:42:45 -0500 [warn]: #0 failed to flush the buffer. retry_time=16 next_retry_seconds=2020-02-18 07:35:31 -0500 chunk="59ec941582a3197510c332bcbe2a101f" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>"node1", :port=>9200, :scheme=>"http"}): Connection refused - connect(2) for 192.168.1.54:80 (Errno::ECONNREFUSED)"

2020-02-17 22:42:45 -0500 [warn]: #0 suppressed same stacktrace

Error Like

2020-02-05 00:39:50 -0500 [warn]: #0 failed to flush the buffer. retry_time=4 next_retry_seconds=2020-02-05 00:39:58 -0500 chunk="59dcd91acf1e2675fd7b28dd88617b7b" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>" node1 ", :port=>9200, :scheme=>"http"}): getaddrinfo: Temporary failure in name resolution (SocketError)"

2020-02-05 00:39:50 -0500 [warn]: #0 suppressed same stacktrace

2020-02-05 00:39:04 -0500 [warn]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.4.2/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'

Error Like

2020-02-04 02:16:22 -0500 [error]: #0 [input_forward] unexpected error on reading data host="192.168.1.53" port=51023 error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data"

2020-02-04 02:16:22 -0500 [warn]: #0 emit transaction failed: error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" location="/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.4.2/lib/fluent/plugin/buffer.rb:298:in `write'" tag="ApplicationLog.Information"

2020-02-04 02:26:52 -0500 [warn]: #0 failed to write data into buffer by buffer overflow action=:throw_exception

Error Like

2020-02-18 02:20:42 -0500 [warn]: #0 failed to flush the buffer. retry_time=0 next_retry_seconds=2020-02-18 02:20:43 -0500 chunk="59ed481cac18e78fa18d726eee8e95e6" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>"node1", :port=>9200, :scheme=>"http"}): Connection refused - connect(2) for 192.168.1.54:80 (Errno::ECONNREFUSED)"
2020-02-18 02:20:42 -0500 [warn]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluent-plugin-elasticsearch-3.5.1/lib/fluent/plugin/out_elasticsearch_dynamic.rb:220:in rescue in send_bulk' 2020-02-18 02:20:42 -0500 [warn]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluent-plugin-elasticsearch-3.5.1/lib/fluent/plugin/out_elasticsearch_dynamic.rb:211:in send_bulk'
2020-02-18 02:20:42 -0500 [warn]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluent-plugin-elasticsearch-3.5.1/lib/fluent/plugin/out_elasticsearch_dynamic.rb:206:in block in write' 2020-02-18 02:20:42 -0500 [warn]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluent-plugin-elasticsearch-3.5.1/lib/fluent/plugin/out_elasticsearch_dynamic.rb:205:in each'
2020-02-18 02:20:42 -0500 [warn]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluent-plugin-elasticsearch-3.5.1/lib/fluent/plugin/out_elasticsearch_dynamic.rb:205:in write' 2020-02-18 02:20:42 -0500 [warn]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.4.2/lib/fluent/plugin/output.rb:1125:in try_flush'
2020-02-18 02:20:42 -0500 [warn]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.4.2/lib/fluent/plugin/output.rb:1425:in flush_thread_run' 2020-02-18 02:20:42 -0500 [warn]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.4.2/lib/fluent/plugin/output.rb:454:in block (2 levels) in start'
2020-02-18 02:20:42 -0500 [warn]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.4.2/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2020-02-18 02:20:43 -0500 [warn]: #0 failed to flush the buffer. retry_time=1 next_retry_seconds=2020-02-18 02:20:43 -0500 chunk="59ed481cac18e78fa18d726eee8e95e6" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>"node1", :port=>9200, :scheme=>"http"}): Connection refused - connect(2) for 192.168.1.54:80 (Errno::ECONNREFUSED)"
2020-02-18 02:20:43 -0500 [warn]: #0 suppressed same stacktrace
2020-02-18 02:20:43 -0500 [warn]: #0 failed to flush the buffer. retry_time=2 next_retry_seconds=2020-02-18 02:20:46 -0500 chunk="59ed481cac18e78fa18d726eee8e95e6" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>"node1", :port=>9200, :scheme=>"http"}): Connection refused - connect(2) for 192.168.1.54:80 (Errno::ECONNREFUSED)"
2020-02-18 02:20:43 -0500 [warn]: #0 suppressed same stacktrace

In our case when we restart service of td-agent all things again work properly but till time logs are not generating.

I want some guideline that, is there anything we need to configure from Elasticsearch side to avoid this error? What should be the root cause for this error?

Can anybody please help me as I have performed below changes from td-agent side but it is not working.

reload_connections false
reconnect_on_error true
reload_on_failure true

flush_interval 10s
flush_thread_count 4

Also suggest that this settings are required on td-agent and how can I optimize it..Please help.

Thanks.

spinscale · February 19, 2020, 2:20pm

It seems to me that the agent software (which I do not know) does not properly recover if the Elasticsearch node is not reachable temporarily. The ECONNREFUSED error indicates, that either Elasticsearch is down or some firewall is rejecting a connection. If that event is taken as a measure to never try again that would be a wrong approach (I am just guessing here).

Maybe ask in the td-agent forums/googlegroups/mailinglist for more information if noone is answering here.

krushnakantB · February 20, 2020, 9:06am

Hi Spinscale,

Thanks for reply. I have checked at the same time, the elasticsearch health status is green and it is working properly. Is there any possibility that chunk size sent by td-agent is larger than elasticsearch supports at a time, so elasticsearch refuses connection. can I know the largest request at a time supported by elasticsearch. My elasticsearch server has 16GB memory and in JVM I have given -Xms8g -Xmx8g.

Thanks.

system · March 19, 2020, 9:11am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Could not push logs to Elasticsearch cluster 日本語による質問・議論はこちら	7	6499	April 5, 2019
ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster Elasticsearch	2	4421	February 28, 2020
Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster Elasticsearch	1	2051	January 17, 2020
Fluentd connection to Elasticsearch(cluster) Elasticsearch	1	1236	March 26, 2020
Fluentd doesnt send log to elasticsearch Kibana	7	1398	August 17, 2023

Td-agent logs stopped working

Related topics