All output pipelines stop when one errors (using pipeline-to-pipeline)

I'm running logstash 7.12.1 in a container based on the standard Docker image, with the New Relic output installed on top. Despite the fact I am using pipeline-to-pipeline, I am hitting a situation where, if one output (ElasticSearch) breaks, the other (New Relic) stops working as well. I was under the impression that this was not meant to happen in a pipeline-to-pipeline configuration.

pipelines.yml

- pipeline.id: analytics
  path.config: "/usr/share/logstash/pipeline/connection-analytics.conf"
- pipeline.id: es-output
  path.config: "/usr/share/logstash/pipeline/es-output.conf"
- pipeline.id: newrelic-output
  path.config: "/usr/share/logstash/pipeline/newrelic-output.conf"

connection-analytics.conf

input {
  tcp {
    codec          => json_lines
    port           => 19995
    tcp_keep_alive => true
  }
}

output {
  pipeline {
    send_to => [newrelic, analytics]
  }
}

es-output.conf

input {
  pipeline {
    address => analytics
  }
}

output {
  elasticsearch {
    hosts    => ["http://${ES_HOST}:${ES_PORT}"]
    user     => "${ES_USERNAME}"
    password => "${ES_PASSWORD}"
    index    => "analytics-%{service}-%{+yyyy.MM.dd}"
  }
}

newrelic-output.conf

input {
  pipeline {
    address => newrelic
  }
}

output {
  newrelic {
    license_key => "${NEW_RELIC_LICENSE_KEY}"
    base_uri    => "${NEW_RELIC_ENDPOINT}"
  }
}

Under normal conditions, this works fine. But if I kill the ElasticSearch endpoint, data export to New Relic also stops. Is this a config error/bug? Or are my expectations of the behaviour incorrect?

Logs:

[2021-07-20T12:30:12,348][INFO ][logstash.javapipeline    ][es-output] Pipeline Java execution initialization time {"seconds"=>1.7}
[2021-07-20T12:30:12,350][INFO ][logstash.javapipeline    ][newrelic-output] Pipeline Java execution initialization time {"seconds"=>2.59}
[2021-07-20T12:30:12,441][INFO ][logstash.javapipeline    ][newrelic-output] Pipeline started {"pipeline.id"=>"newrelic-output"}
[2021-07-20T12:30:12,445][INFO ][logstash.javapipeline    ][es-output] Pipeline started {"pipeline.id"=>"es-output"}
[2021-07-20T12:30:12,658][INFO ][logstash.javapipeline    ][analytics] Pipeline Java execution initialization time {"seconds"=>2.1}
[2021-07-20T12:30:13,044][INFO ][logstash.javapipeline    ][analytics] Pipeline started {"pipeline.id"=>"analytics"}
[2021-07-20T12:30:13,076][INFO ][logstash.inputs.tcp      ][analytics][0bf8873b7bea49a21b7d3372cb182db4c835758a982763b9ba8b47580946394d] Starting tcp input}
[2021-07-20T12:30:13,251][INFO ][logstash.agent           ] Pipelines running {:count=>3, :running_pipelines=>[:"newrelic-output", :"es-output", }

ElasticSearch killed now

[2021-07-20T12:32:20,640][WARN ][logstash.outputs.elasticsearch][es-output][bd90eb9e58bc5bd1e842b2eef6acb2bebbdb9b37cf2d76e8af2ee769a3d4da91] Mar}
[2021-07-20T12:32:20,649][ERROR][logstash.outputs.elasticsearch][es-output][bd90eb9e58bc5bd1e842b2eef6acb2bebbdb9b37cf2d76e8af2ee769a3d4da91] Att}
[2021-07-20T12:32:20,776][WARN ][logstash.outputs.elasticsearch][es-output] Attempted to resurrect connection to dead ES instance, but got an err}
[2021-07-20T12:32:22,677][ERROR][logstash.outputs.elasticsearch][es-output][bd90eb9e58bc5bd1e842b2eef6acb2bebbdb9b37cf2d76e8af2ee769a3d4da91] Att}
[2021-07-20T12:32:25,783][WARN ][logstash.outputs.elasticsearch][es-output] Attempted to resurrect connection to dead ES instance, but got an err}
[2021-07-20T12:32:26,683][ERROR][logstash.outputs.elasticsearch][es-output][bd90eb9e58bc5bd1e842b2eef6acb2bebbdb9b37cf2d76e8af2ee769a3d4da91] Att}
[2021-07-20T12:32:30,801][WARN ][logstash.outputs.elasticsearch][es-output] Attempted to resurrect connection to dead ES instance, but got an err}

No errors are reported by the newrelic-output pipeline.

logstash still has an at-least-once delivery model. Using an output isolator pattern you can have separate queues for each output, but it is still true that if those queues fill up then back-pressure will shut down the pipeline and the inputs.

Thanks for the response. Is there a way to confirm that this is what it happening? The New Relic output stops as soon as the ElasticSearch error is reported, and I'm only pushing through a small test load so I'm surprised back-pressure would become an issue so quickly (i.e. almost instantaneously). Also, should there not be an indication in the logs if there's an issue with the input pipeline's ability to deliver messages?

I am speculating, but, as I understand it, the in-memory queue is between the inputs and the pipeline. I think a persistent queue, if used, is between the pipeline and the outputs. That may mean the capacity of the pipeline is a single batch, so once an output stops accepting events, back-pressure is immediately transmitted all the way back to the in-memory queue which will continue to fill, but not to send events into the pipeline.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.