When using multiple pipelines, problem in startup of one pipeline can stall startup of other unconnected pipelines

Hi,

We are running v6.4.0 of logstash. Our setup has 2 unconnected pipelines. One pipeline 0_main handles logs from other hosts (shipped via filebeat) and writes to elasticsearch. The other pipeline 2_monitor queries logstash API and writes health data to a graphite server. Since v6.3.0 a third pipeline, the xpack monitoring pipeline, is automatically a part of our setup.

Yesterday we had a power issue which led to restart of our entire infrastructure (logstash and graphite). The graphite process failed to start due to an unrelated error. Logstash hosts started up all right but because the graphite host could not be reached, the 2_monitor pipeline went into an infinite loop at startup (looking for the connection to graphite output). My understanding of separate pipelines is that an issue in one pipeline should not affect the other pipeline. However, this morning we noticed that logstash had not fully started.

Here is a snapshot of the log for entire restart sequence

[2018-09-09T01:29:59,280][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"6.4.0"}
[2018-09-09T01:30:01,925][WARN ][org.logstash.ackedqueue.Queue] recovered head data page 5081 is different than checkpoint, using recovered page information
[2018-09-09T01:30:02,492][INFO ][logstash.pipeline        ] Starting pipeline {:pipeline_id=>".monitoring-logstash", "pipeline.workers"=>1, "pipeline.batch.size"=>2, "pipeline.batch.delay"=>50}
[2018-09-09T01:30:02,837][INFO ][logstash.pipeline        ] Starting pipeline {:pipeline_id=>"2_monitor", "pipeline.workers"=>1, "pipeline.batch.size"=>1, "pipeline.batch.delay"=>50}
[2018-09-09T01:30:03,084][WARN ][logstash.outputs.graphite] Connection refused to graphite server, sleeping... {:host=>"statsmaster", :port=>2003}
[2018-09-09T01:30:03,129][INFO ][logstash.outputs.elasticsearch] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://esdatanodes:9200/]}}
[2018-09-09T01:30:03,208][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://esdatanodes:9200/, :path=>"/"}
[2018-09-09T01:30:03,328][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://esdatanodes:9200/"}
[2018-09-09T01:30:03,366][INFO ][logstash.outputs.elasticsearch] ES Output version determined {:es_version=>6}
[2018-09-09T01:30:03,366][WARN ][logstash.outputs.elasticsearch] Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>6}
[2018-09-09T01:30:03,368][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["//esdatanodes"]}
[2018-09-09T01:30:03,396][INFO ][logstash.licensechecker.licensereader] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://esdatanodes:9200/]}}
[2018-09-09T01:30:03,396][INFO ][logstash.licensechecker.licensereader] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://esdatanodes:9200/, :path=>"/"}
[2018-09-09T01:30:03,403][WARN ][logstash.licensechecker.licensereader] Restored connection to ES instance {:url=>"http://esdatanodes:9200/"}
[2018-09-09T01:30:03,410][INFO ][logstash.licensechecker.licensereader] ES Output version determined {:es_version=>6}
[2018-09-09T01:30:03,473][INFO ][logstash.pipeline        ] Pipeline started successfully {:pipeline_id=>".monitoring-logstash", :thread=>"#<Thread:0x6c848a91@/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:156 sleep>"}
[2018-09-09T01:30:05,086][WARN ][logstash.outputs.graphite] Connection refused to graphite server, sleeping... {:host=>"statsmaster", :port=>2003}
[2018-09-09T01:30:05,898][INFO ][logstash.pipeline        ] Starting pipeline {:pipeline_id=>"0_main", "pipeline.workers"=>7, "pipeline.batch.size"=>1400, "pipeline.batch.delay"=>50}
[2018-09-09T01:30:05,901][WARN ][logstash.outputs.graphite] Connection refused to graphite server, sleeping... {:host=>"statsmaster", :port=>2003}
.......Repeated for 24 hours......
[2018-09-10T10:11:07,212][WARN ][logstash.outputs.graphite] Connection refused to graphite server, sleeping... {:host=>"statsmaster", :port=>2003}
[2018-09-10T10:11:07,212][WARN ][logstash.outputs.graphite] Connection refused to graphite server, sleeping... {:host=>"statsmaster", :port=>2003}
[2018-09-10T10:11:09,224][INFO ][logstash.inputs.http_poller] Registering http_poller Input {:type=>"metric", :schedule=>{"every"=>"30s"}, :timeout=>nil}
[2018-09-10T10:11:09,229][INFO ][logstash.outputs.elasticsearch] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://esdatanodes:9200/]}}
[2018-09-10T10:11:09,231][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://esdatanodes:9200/, :path=>"/"}
[2018-09-10T10:11:09,232][INFO ][logstash.pipeline        ] Pipeline started successfully {:pipeline_id=>"2_monitor", :thread=>"#<Thread:0x1da04251@/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:156 sleep>"}
[2018-09-10T10:11:09,239][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://esdatanodes:9200/"}
[2018-09-10T10:11:09,247][INFO ][logstash.outputs.elasticsearch] ES Output version determined {:es_version=>6}
[2018-09-10T10:11:09,248][WARN ][logstash.outputs.elasticsearch] Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>6}
[2018-09-10T10:11:09,249][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["//esdatanodes"]}
[2018-09-10T10:11:19,472][INFO ][logstash.pipeline        ] Pipeline started successfully {:pipeline_id=>"0_main", :thread=>"#<Thread:0x445681d1@/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:156 sleep>"}
[2018-09-10T10:11:19,485][INFO ][logstash.agent           ] Pipelines running {:count=>3, :running_pipelines=>[:"0_main", :"2_monitor", :".monitoring-logstash"], :non_running_pipelines=>[]}
[2018-09-10T10:11:19,486][INFO ][logstash.inputs.metrics  ] Monitoring License OK

Logstash attempted to start the .monitoring-logstash and 2_monitor pipeline. The .monitoring-logstash pipeline started up all right and so it tried to start the 0_main pipeline. But 2_monitor pipeline got stuck as graphite output could not be instantiated. From here on, no attempt was made to continue the startup of 0_main pipeline and logstash was in limbo for 30 hours. This morning, we started the graphite process and once the output was instantiated, the rest of logstash startup continued.

Any thoughts about why this happened?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.