Logstash can't talk to elasticsearch after several hours' running

caoping · October 17, 2016, 2:35pm

I found that after several hours running, my logstash can't connect to elasticsearch any more. From logstash log, it said:

{:timestamp=>"2016-10-17T04:39:40.312000+0200", :message=>"Starting pipeline", :id=>"main", :pipeline_workers=>8, :batch_size=>1000, :batch_delay=>5, :max_inflight=>8000, :level=>:info}
{:timestamp=>"2016-10-17T04:39:40.325000+0200", :message=>"Pipeline main started"}
{:timestamp=>"2016-10-17T06:18:55.520000+0200", :message=>"Attempted to send a bulk request to Elasticsearch configured at '["http://146.89.179.204:9200"]', but Elasticsearch appears to be unreachable or down!", :error_message=>"Connection timed out", :class=>"Manticore::SocketException", :level=>:error}

However, I checked my elasticsearch was still running.

What I'm using:

Filebeat: 1.2.3
Logstash: 2.3.4
Elasticsearch: 2.3.5

Any help greatly appreciated. Thanks a lot!

caoping · October 18, 2016, 9:43am

Anyone met this issue before? Or anyone know what caused this kind of issue?

I found some logs from logstash side. It said there were 3 running workers(8 workers in total), and each of these running workers were working on the job specified by the last plugin date.

{:timestamp=>"2016-10-18T09:42:15.320000+0200", :message=>#<LogStash::PipelineReporter::Snapshot:0x7241d839 @data={:events_filtered=>58098, :events_consumed=>58098, :worker_count=>8, :inflight_count=>179, :worker_states=>[{:status=>"dead", :alive=>false, :index=>0, :inflight_count=>0}, {:status=>"run", :alive=>true, :index=>1, :inflight_count=>36}, {:status=>"dead", :alive=>false, :index=>2, :inflight_count=>0}, {:status=>"dead", :alive=>false, :index=>3, :inflight_count=>0}, {:status=>"dead", :alive=>false, :index=>4, :inflight_count=>0}, {:status=>"run", :alive=>true, :index=>5, :inflight_count=>88}, {:status=>"dead", :alive=>false, :index=>6, :inflight_count=>0}, {:status=>"run", :alive=>true, :index=>7, :inflight_count=>55}], :output_info=>[{:type=>"elasticsearch", :config=>{"hosts"=>"146.89.179.204", "index"=>"logstash-site-%{+YYYY.MM.dd}", "workers"=>8, "flush_size"=>1000, "ALLOW_ENV"=>false}, :is_multi_worker=>true, :events_received=>58098, :workers=><Java::JavaUtilConcurrent::CopyOnWriteArrayList:-1980582399 [<LogStash::Outputs::Elasticsearch hosts=>["146.89.179.204"], index=>"logstash-site-%{+YYYY.MM.dd}", workers=>8, flush_size=>1000, codec=><LogStash::Codecs::Plain charset=>"UTF-8">, manage_template=>true, template_name=>"logstash", template_overwrite=>false, idle_flush_time=>1, doc_as_upsert=>false, max_retries=>3, script_type=>"inline", script_var_name=>"event", scripted_upsert=>false, retry_max_interval=>2, retry_max_items=>500, retry_on_conflict=>1, action=>"index", ssl_certificate_verification=>true, sniffing=>false, sniffing_delay=>5>, <LogStash::Outputs::Elasticsearch hosts=>["146.89.179.204"], index=>"logstash-site-%{+YYYY.MM.dd}", workers=>8, flush_size=>1000, codec=><LogStash::Codecs::Plain charset=>"UTF-8">, manage_template=>true, template_name=>"logstash", template_overwrite=>false, idle_flush_time=>1, doc_as_upsert=>false, max_retries=>3, script_type=>"inline", script_var_name=>"event", scripted_upsert=>false, retry_max_interval=>2, retry_max_items=>500, retry_on_conflict=>1, action=>"index", ssl_certificate_verification=>true, sniffing=>false, sniffing_delay=>5>, <LogStash::Outputs::Elasticsearch hosts=>["146.89.179.204"], index=>"logstash-site-%{+YYYY.MM.dd}", workers=>8, flush_size=>1000, codec=><LogStash::Codecs::Plain charset=>"UTF-8">, manage_template=>true, template_name=>"logstash", template_overwrite=>false, idle_flush_time=>1, doc_as_upsert=>false, max_retries=>3, script_type=>"inline", script_var_name=>"event", scripted_upsert=>false, retry_max_interval=>2, retry_max_items=>500, retry_on_conflict=>1, action=>"index", ssl_certificate_verification=>true, sniffing=>false, sniffing_delay=>5>, <LogStash::Outputs::Elasticsearch hosts=>["146.89.179.204"], index=>"logstash-site-%{+YYYY.MM.dd}", workers=>8, flush_size=>1000, codec=><LogStash::Codecs::Plain charset=>"UTF-8">, manage_template=>true, template_name=>"logstash", template_overwrite=>false, idle_flush_time=>1, doc_as_upsert=>false, max_retries=>3, script_type=>"inline", script_var_name=>"event", scripted_upsert=>false, retry_max_interval=>2, retry_max_items=>500, retry_on_conflict=>1, action=>"index", ssl_certificate_verification=>true, sniffing=>false, sniffing_delay=>5>,

this log line is too long, the rest please look at another response below.

caoping · October 18, 2016, 9:53am

continued log:

:busy_workers=>3}], :thread_info=>[{"thread_id"=>28, "name"=>"[main]>worker1", "plugin"=>["LogStash::Filters::Date", {"match"=>["Logtime", "M/d/yy HH:mm:ss:SSS z"], "timezone"=>"UTC"}], "backtrace"=>["[...]/vendor/bundle/jruby/1.9/gems/manticore-0.6.0-java/lib/manticore/response.rb:50:in call'", "[...]/vendor/bundle/jruby/1.9/gems/manticore-0.6.0-java/lib/manticore/response.rb:256:in call_once'", "[...]/vendor/bundle/jruby/1.9/gems/manticore-0.6.0-java/lib/manticore/response.rb:153:in code'", "[...]/vendor/bundle/jruby/1.9/gems/elasticsearch-transport-1.0.18/lib/elasticsearch/transport/transport/http/manticore.rb:84:in perform_request'", "[...]/vendor/bundle/jruby/1.9/gems/elasticsearch-transport-1.0.18/lib/elasticsearch/transport/transport/base.rb:257:in call'", "[...]/vendor/bundle/jruby/1.9/gems/elasticsearch-transport-1.0.18/lib/elasticsearch/transport/transport/base.rb:257:in perform_request'", "[...]/vendor/bundle/jruby/1.9/gems/elasticsearch-transport-1.0.18/lib/elasticsearch/transport/transport/http/manticore.rb:67:in perform_request'", "[...]/vendor/bundle/jruby/1.9/gems/elasticsearch-transport-1.0.18/lib/elasticsearch/transport/client.rb:128:in perform_request'", "[...]/vendor/bundle/jruby/1.9/gems/elasticsearch-api-1.0.18/lib/elasticsearch/api/actions/bulk.rb:90:in bulk'", "[...]/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-2.7.1-java/lib/logstash/outputs/elasticsearch/http_client.rb:53:in non_threadsafe_bulk'", "[...]/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-2.7.1-java/lib/logstash/outputs/elasticsearch/http_client.rb:38:in bulk'", "[...]/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-2.7.1-java/lib/logstash/outputs/elasticsearch/http_client.rb:38:in synchronize'", "[...]/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-2.7.1-java/lib/logstash/outputs/elasticsearch/http_client.rb:38:in bulk'", "[...]/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-2.7.1-java/lib/logstash/outputs/elasticsearch/common.rb:172:in safe_bulk'", "[...]/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-2.7.1-java/lib/logstash/outputs/elasticsearch/common.rb:101:in submit'", "[...]/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-2.7.1-java/lib/logstash/outputs/elasticsearch/common.rb:86:in retrying_submit'", "[...]/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-2.7.1-java/lib/logstash/outputs/elasticsearch/common.rb:29:in multi_receive'", "[...]/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-2.7.1-java/lib/logstash/outputs/elasticsearch/common.rb:28:in each_slice'", "[...]/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-2.7.1-java/lib/logstash/outputs/elasticsearch/common.rb:28:in multi_receive'", "[...]/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.4-java/lib/logstash/output_delegator.rb:130:in worker_multi_receive'", "[...]/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.4-java/lib/logstash/output_delegator.rb:129:in worker_multi_receive'", "[...]/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.4-java/lib/logstash/output_delegator.rb:114:in multi_receive'", "[...]/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.4-java/lib/logstash/pipeline.rb:301:in output_batch'", "[...]/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.4-java/lib/logstash/pipeline.rb:301:in each'", "[...]/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.4-java/lib/logstash/pipeline.rb:301:in output_batch'", "[...]/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.4-java/lib/logstash/pipeline.rb:232:in worker_loop'", "[...]/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.4-java/lib/logstash/pipeline.rb:201:in start_workers'"], "blocked_on"=>nil, "status"=>"run", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/manticore-0.6.0-java/lib/manticore/response.rb:50:in \call'"}], :stalling_threads_info=>[{"thread_id"=>28, "name"=>"[main]>worker1", "plugin"=>["LogStash::Filters::Date", {"match"=>["Logtime", "M/d/yy HH:mm:ss:SSS z"], "timezone"=>"UTC"}], "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/manticore-0.6.0-java/lib/manticore/response.rb:50:in `call'"}, {"thread_id"=>32, "name"=>"[main]>worker5", "plugin"=>["LogStash::Filters::Date", {"match"=>["Logtime", "M/d/yy HH:mm:ss:SSS z"], "timezone"=>"UTC"}], "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/manticore-0.6.0-java/lib/manticore/response.rb:50:in `call'"}, {"thread_id"=>34, "name"=>"[main]>worker7", "plugin"=>["LogStash::Filters::Date", {"match"=>["Logtime", "M/d/yy HH:mm:ss:SSS z"], "timezone"=>"UTC"}], "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/manticore-0.6.0-java/lib/manticore/response.rb:50:in `call'"}]}>, :level=>:warn}

Christian_Dahlqvist · October 18, 2016, 10:03am

Is there anything in the Elasticsearch logs around the time the problems occurred?

caoping · October 21, 2016, 2:29am

No. I didn't see anything in Elasticsearch logs. Every time I run into this issue, it can be resolved after I kill (kill -9) logstash, and start it again. I can't stop logstash safely.

I have two VMs A and B, A is used as Logstash server, and B is both Logstash server and Elasticsearch server. The connection problem was only occurred in A.

I'm not sure if this information helps to identify the error. These days, this issue came out very often, which has impacted our daily use greatly.

Any help greatly appreciated. Thanks so much!

Selle · October 21, 2016, 7:18pm

Hi.
Our logstash agents also had this issue. After a while they got the socket
exception and they could never recover. Restarting them was the only way.

The previous setup:
Logstash 2.1 and 2.3
Elasticsearch 1.7

I ended up editing common.rb in the elasticsearch output to get any info
about how many events it tries to send and how big. Often the size of the
bulk request was small: 1 event and 500Bytes so probably not too big bulk
request then

I ended up upgrading all agents to 2.4 and the error disappeared except
from one node that sends around 200k events/hour. The good thing is that
it can recover itself now.

Hope it helps

//Rickard

caoping · October 24, 2016, 3:03am

Hi Rickard,

Thanks so much for your information. I'll try the upgrade later to see if it can resolve my issue. Any progress will be updated here. Thanks again.

Topic		Replies	Views
Logstash stop sending the logs to the ElasticSearch Logstash	11	5517	July 6, 2017
Logstash stop communicating with Elasticsearch Elasticsearch	4	599	July 6, 2017
Logstash lose connection to ElasticSearch Logstash	3	2250	July 6, 2017
Attempted to send a bulk request to elasticsearch, but no there are no living connections in the connection pool Logstash	3	2624	August 12, 2021
Logstash 2.2.0 > elasticsearch output connectivity issue Logstash	5	5187	July 6, 2017

Logstash can't talk to elasticsearch after several hours' running

Related topics