Pipeline to pipeline distributor, how not to lose events

Hello

TL;DR - What pipelines must be set a persistent queues to avoid losing data, in a pepeline-to-pipeline distributor pattern

We are using Pipeline to pipeline comunication in Logstash. We use the distributor pattern, since we have a Logstash input where we get all the events from several Filebeats but then according to an event field we want to grok/dissect differently and save the event to a different Elasticsearch index or even save it to Elasticsearch AND to file.

So, we have just one pipeline with beats input, and an output with several "if", and each "if" have a different pipeline output.
Then we have another 10 pipelines whose inputs are the output of the first pipeline.

Now, we cannot lose data, so we wonder how ACKs happen in the pipeline to pipeline communication. We are going to use persistent queue(s), but which pipelines should have a persistent queue? Everyone? Just the first one? everyone but the first one?

This how communication work
Filebeat --> logstash pipeline 1 --> logstash pipeline 2 --> Elasticsearch.

Is this how ACKs happen?
Filebeat --> logstash pipeline 1 input --> logstash pipeline 1 queue --> logstash pipeline 1 sends ACK to filebeat --> logstash pipeline 1 filter --> logstash pipeline 1 output --> Logstash pipeline 2 input --> Logstash pipeline 2 queue --> logstash pipeline 2 filter --> logstash pipeline 2 output --> Elasticsearch --> Elasticsearch sends ACK to logstash pipeline 2 --> logstash pipeline 2 sends ACK to logstash pipeline 1
If it works like that, then using just a persisted queue in the first pipeline would be enough.

Or do ACKs work like this
Filebeat --> logstash pipeline 1 input --> logstash pipeline 1 queue --> logstash pipeline 1 sends ACK to filebeat --> logstash pipeline 1 filter --> logstash pipeline 1 output --> Logstash pipeline 2 input --> Logstash pipeline 2 queue --> logstash pipeline 2 sends ACK to logstash pipeline 1 --> logstash pipeline 2 filter --> logstash pipeline 2 output --> Elasticsearch --> Elasticsearch sends ACK to logstash pipeline 2
If it works this way we would need pesisted queues in every pipeline.

If I have to use persisted queues in all the pipelines, is there a way to configure the size of each queue? Some pipelines will send A LOT of data, so big queue is needed, some will send just a bit bytes so small queue is needed there.

Thanks a lot in advance!!

1 Like

Given that a blocked downstream pipeline will block an upstream pipeline I think it has to be the case that they need separate persistent queues. That would imply the ACK occurs as soon as the data is handed off.

Hi Badger, thanks for the answer,
Given that I need several Logstash queues, I have some pipelines that need a BIG queue, but some need a small one.
Is there any way to configure Logstash queues so that each queue has its own, different size from other queues? I do not find anything.

Thanks

I do not think so, queue size appears to be a global option.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.