When using multiple pipelines, problem in startup of one pipeline can stall startup of other unconnected pipelines

Dheeraj_Gupta · September 10, 2018, 6:36am

Hi,

We are running v6.4.0 of logstash. Our setup has 2 unconnected pipelines. One pipeline 0_main handles logs from other hosts (shipped via filebeat) and writes to elasticsearch. The other pipeline 2_monitor queries logstash API and writes health data to a graphite server. Since v6.3.0 a third pipeline, the xpack monitoring pipeline, is automatically a part of our setup.

Yesterday we had a power issue which led to restart of our entire infrastructure (logstash and graphite). The graphite process failed to start due to an unrelated error. Logstash hosts started up all right but because the graphite host could not be reached, the 2_monitor pipeline went into an infinite loop at startup (looking for the connection to graphite output). My understanding of separate pipelines is that an issue in one pipeline should not affect the other pipeline. However, this morning we noticed that logstash had not fully started.

Here is a snapshot of the log for entire restart sequence

[2018-09-09T01:29:59,280][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"6.4.0"}
[2018-09-09T01:30:01,925][WARN ][org.logstash.ackedqueue.Queue] recovered head data page 5081 is different than checkpoint, using recovered page information
[2018-09-09T01:30:02,492][INFO ][logstash.pipeline        ] Starting pipeline {:pipeline_id=>".monitoring-logstash", "pipeline.workers"=>1, "pipeline.batch.size"=>2, "pipeline.batch.delay"=>50}
[2018-09-09T01:30:02,837][INFO ][logstash.pipeline        ] Starting pipeline {:pipeline_id=>"2_monitor", "pipeline.workers"=>1, "pipeline.batch.size"=>1, "pipeline.batch.delay"=>50}
[2018-09-09T01:30:03,084][WARN ][logstash.outputs.graphite] Connection refused to graphite server, sleeping... {:host=>"statsmaster", :port=>2003}
[2018-09-09T01:30:03,129][INFO ][logstash.outputs.elasticsearch] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://esdatanodes:9200/]}}
[2018-09-09T01:30:03,208][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://esdatanodes:9200/, :path=>"/"}
[2018-09-09T01:30:03,328][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://esdatanodes:9200/"}
[2018-09-09T01:30:03,366][INFO ][logstash.outputs.elasticsearch] ES Output version determined {:es_version=>6}
[2018-09-09T01:30:03,366][WARN ][logstash.outputs.elasticsearch] Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>6}
[2018-09-09T01:30:03,368][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["//esdatanodes"]}
[2018-09-09T01:30:03,396][INFO ][logstash.licensechecker.licensereader] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://esdatanodes:9200/]}}
[2018-09-09T01:30:03,396][INFO ][logstash.licensechecker.licensereader] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://esdatanodes:9200/, :path=>"/"}
[2018-09-09T01:30:03,403][WARN ][logstash.licensechecker.licensereader] Restored connection to ES instance {:url=>"http://esdatanodes:9200/"}
[2018-09-09T01:30:03,410][INFO ][logstash.licensechecker.licensereader] ES Output version determined {:es_version=>6}
[2018-09-09T01:30:03,473][INFO ][logstash.pipeline        ] Pipeline started successfully {:pipeline_id=>".monitoring-logstash", :thread=>"#<Thread:0x6c848a91@/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:156 sleep>"}
[2018-09-09T01:30:05,086][WARN ][logstash.outputs.graphite] Connection refused to graphite server, sleeping... {:host=>"statsmaster", :port=>2003}
[2018-09-09T01:30:05,898][INFO ][logstash.pipeline        ] Starting pipeline {:pipeline_id=>"0_main", "pipeline.workers"=>7, "pipeline.batch.size"=>1400, "pipeline.batch.delay"=>50}
[2018-09-09T01:30:05,901][WARN ][logstash.outputs.graphite] Connection refused to graphite server, sleeping... {:host=>"statsmaster", :port=>2003}
.......Repeated for 24 hours......
[2018-09-10T10:11:07,212][WARN ][logstash.outputs.graphite] Connection refused to graphite server, sleeping... {:host=>"statsmaster", :port=>2003}
[2018-09-10T10:11:07,212][WARN ][logstash.outputs.graphite] Connection refused to graphite server, sleeping... {:host=>"statsmaster", :port=>2003}
[2018-09-10T10:11:09,224][INFO ][logstash.inputs.http_poller] Registering http_poller Input {:type=>"metric", :schedule=>{"every"=>"30s"}, :timeout=>nil}
[2018-09-10T10:11:09,229][INFO ][logstash.outputs.elasticsearch] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://esdatanodes:9200/]}}
[2018-09-10T10:11:09,231][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://esdatanodes:9200/, :path=>"/"}
[2018-09-10T10:11:09,232][INFO ][logstash.pipeline        ] Pipeline started successfully {:pipeline_id=>"2_monitor", :thread=>"#<Thread:0x1da04251@/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:156 sleep>"}
[2018-09-10T10:11:09,239][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://esdatanodes:9200/"}
[2018-09-10T10:11:09,247][INFO ][logstash.outputs.elasticsearch] ES Output version determined {:es_version=>6}
[2018-09-10T10:11:09,248][WARN ][logstash.outputs.elasticsearch] Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>6}
[2018-09-10T10:11:09,249][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["//esdatanodes"]}
[2018-09-10T10:11:19,472][INFO ][logstash.pipeline        ] Pipeline started successfully {:pipeline_id=>"0_main", :thread=>"#<Thread:0x445681d1@/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:156 sleep>"}
[2018-09-10T10:11:19,485][INFO ][logstash.agent           ] Pipelines running {:count=>3, :running_pipelines=>[:"0_main", :"2_monitor", :".monitoring-logstash"], :non_running_pipelines=>[]}
[2018-09-10T10:11:19,486][INFO ][logstash.inputs.metrics  ] Monitoring License OK

Logstash attempted to start the .monitoring-logstash and 2_monitor pipeline. The .monitoring-logstash pipeline started up all right and so it tried to start the 0_main pipeline. But 2_monitor pipeline got stuck as graphite output could not be instantiated. From here on, no attempt was made to continue the startup of 0_main pipeline and logstash was in limbo for 30 hours. This morning, we started the graphite process and once the output was instantiated, the rest of logstash startup continued.

Any thoughts about why this happened?

system · October 8, 2018, 6:36am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash monitoring not showing one pipeline Logstash	1	268	December 14, 2020
Pipeline blocking from output plugins Logstash	1	847	July 6, 2017
Multi Pipelines don't all start Logstash	5	1073	June 5, 2020
Multiple pipelines on Logstash 6.6 Logstash	1	445	April 12, 2019
Multiple pipelines - logstash cannot start Logstash	5	489	March 31, 2020

When using multiple pipelines, problem in startup of one pipeline can stall startup of other unconnected pipelines

Related topics