All output pipelines stop when one errors (using pipeline-to-pipeline)

mgeldert · July 20, 2021, 11:26am

I'm running logstash 7.12.1 in a container based on the standard Docker image, with the New Relic output installed on top. Despite the fact I am using pipeline-to-pipeline, I am hitting a situation where, if one output (ElasticSearch) breaks, the other (New Relic) stops working as well. I was under the impression that this was not meant to happen in a pipeline-to-pipeline configuration.

pipelines.yml

- pipeline.id: analytics
  path.config: "/usr/share/logstash/pipeline/connection-analytics.conf"
- pipeline.id: es-output
  path.config: "/usr/share/logstash/pipeline/es-output.conf"
- pipeline.id: newrelic-output
  path.config: "/usr/share/logstash/pipeline/newrelic-output.conf"

connection-analytics.conf

input {
  tcp {
    codec          => json_lines
    port           => 19995
    tcp_keep_alive => true
  }
}

output {
  pipeline {
    send_to => [newrelic, analytics]
  }
}

es-output.conf

input {
  pipeline {
    address => analytics
  }
}

output {
  elasticsearch {
    hosts    => ["http://${ES_HOST}:${ES_PORT}"]
    user     => "${ES_USERNAME}"
    password => "${ES_PASSWORD}"
    index    => "analytics-%{service}-%{+yyyy.MM.dd}"
  }
}

newrelic-output.conf

input {
  pipeline {
    address => newrelic
  }
}

output {
  newrelic {
    license_key => "${NEW_RELIC_LICENSE_KEY}"
    base_uri    => "${NEW_RELIC_ENDPOINT}"
  }
}

Under normal conditions, this works fine. But if I kill the ElasticSearch endpoint, data export to New Relic also stops. Is this a config error/bug? Or are my expectations of the behaviour incorrect?

mgeldert · July 20, 2021, 12:36pm

Logs:

[2021-07-20T12:30:12,348][INFO ][logstash.javapipeline    ][es-output] Pipeline Java execution initialization time {"seconds"=>1.7}
[2021-07-20T12:30:12,350][INFO ][logstash.javapipeline    ][newrelic-output] Pipeline Java execution initialization time {"seconds"=>2.59}
[2021-07-20T12:30:12,441][INFO ][logstash.javapipeline    ][newrelic-output] Pipeline started {"pipeline.id"=>"newrelic-output"}
[2021-07-20T12:30:12,445][INFO ][logstash.javapipeline    ][es-output] Pipeline started {"pipeline.id"=>"es-output"}
[2021-07-20T12:30:12,658][INFO ][logstash.javapipeline    ][analytics] Pipeline Java execution initialization time {"seconds"=>2.1}
[2021-07-20T12:30:13,044][INFO ][logstash.javapipeline    ][analytics] Pipeline started {"pipeline.id"=>"analytics"}
[2021-07-20T12:30:13,076][INFO ][logstash.inputs.tcp      ][analytics][0bf8873b7bea49a21b7d3372cb182db4c835758a982763b9ba8b47580946394d] Starting tcp input}
[2021-07-20T12:30:13,251][INFO ][logstash.agent           ] Pipelines running {:count=>3, :running_pipelines=>[:"newrelic-output", :"es-output", }

ElasticSearch killed now

[2021-07-20T12:32:20,640][WARN ][logstash.outputs.elasticsearch][es-output][bd90eb9e58bc5bd1e842b2eef6acb2bebbdb9b37cf2d76e8af2ee769a3d4da91] Mar}
[2021-07-20T12:32:20,649][ERROR][logstash.outputs.elasticsearch][es-output][bd90eb9e58bc5bd1e842b2eef6acb2bebbdb9b37cf2d76e8af2ee769a3d4da91] Att}
[2021-07-20T12:32:20,776][WARN ][logstash.outputs.elasticsearch][es-output] Attempted to resurrect connection to dead ES instance, but got an err}
[2021-07-20T12:32:22,677][ERROR][logstash.outputs.elasticsearch][es-output][bd90eb9e58bc5bd1e842b2eef6acb2bebbdb9b37cf2d76e8af2ee769a3d4da91] Att}
[2021-07-20T12:32:25,783][WARN ][logstash.outputs.elasticsearch][es-output] Attempted to resurrect connection to dead ES instance, but got an err}
[2021-07-20T12:32:26,683][ERROR][logstash.outputs.elasticsearch][es-output][bd90eb9e58bc5bd1e842b2eef6acb2bebbdb9b37cf2d76e8af2ee769a3d4da91] Att}
[2021-07-20T12:32:30,801][WARN ][logstash.outputs.elasticsearch][es-output] Attempted to resurrect connection to dead ES instance, but got an err}

No errors are reported by the newrelic-output pipeline.

Badger · July 20, 2021, 4:43pm

logstash still has an at-least-once delivery model. Using an output isolator pattern you can have separate queues for each output, but it is still true that if those queues fill up then back-pressure will shut down the pipeline and the inputs.

mgeldert · July 20, 2021, 5:29pm

Thanks for the response. Is there a way to confirm that this is what it happening? The New Relic output stops as soon as the ElasticSearch error is reported, and I'm only pushing through a small test load so I'm surprised back-pressure would become an issue so quickly (i.e. almost instantaneously). Also, should there not be an indication in the logs if there's an issue with the input pipeline's ability to deliver messages?

Badger · July 20, 2021, 5:45pm

I am speculating, but, as I understand it, the in-memory queue is between the inputs and the pipeline. I think a persistent queue, if used, is between the pipeline and the outputs. That may mean the capacity of the pipeline is a single batch, so once an output stops accepting events, back-pressure is immediately transmitted all the way back to the in-memory queue which will continue to fill, but not to send events into the pipeline.

system · August 17, 2021, 5:46pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash multi-pipeline output failure breaks other pipelines Logstash	3	531	August 27, 2020
Output isolator pattern, not working as expected when one pipeline is down Logstash docker	10	506	December 20, 2022
Logstash output flow stops to ES when using pipeline-to-pipeline isolator pattern and one output fails Logstash	1	547	November 18, 2018
Using multiple outputs and reliability Logstash	6	797	August 24, 2018
Logstash multiple pipelines failure with persistent queues Logstash	5	1407	March 7, 2022

All output pipelines stop when one errors (using pipeline-to-pipeline)

Related topics