Pipeline to pipeline distributor, how not to lose events

nahiko2000 · October 9, 2019, 11:22am

Hello

TL;DR - What pipelines must be set a persistent queues to avoid losing data, in a pepeline-to-pipeline distributor pattern

We are using Pipeline to pipeline comunication in Logstash. We use the distributor pattern, since we have a Logstash input where we get all the events from several Filebeats but then according to an event field we want to grok/dissect differently and save the event to a different Elasticsearch index or even save it to Elasticsearch AND to file.

So, we have just one pipeline with beats input, and an output with several "if", and each "if" have a different pipeline output.
Then we have another 10 pipelines whose inputs are the output of the first pipeline.

Now, we cannot lose data, so we wonder how ACKs happen in the pipeline to pipeline communication. We are going to use persistent queue(s), but which pipelines should have a persistent queue? Everyone? Just the first one? everyone but the first one?

This how communication work
Filebeat --> logstash pipeline 1 --> logstash pipeline 2 --> Elasticsearch.

Is this how ACKs happen?
Filebeat --> logstash pipeline 1 input --> logstash pipeline 1 queue --> logstash pipeline 1 sends ACK to filebeat --> logstash pipeline 1 filter --> logstash pipeline 1 output --> Logstash pipeline 2 input --> Logstash pipeline 2 queue --> logstash pipeline 2 filter --> logstash pipeline 2 output --> Elasticsearch --> Elasticsearch sends ACK to logstash pipeline 2 --> logstash pipeline 2 sends ACK to logstash pipeline 1
If it works like that, then using just a persisted queue in the first pipeline would be enough.

Or do ACKs work like this
Filebeat --> logstash pipeline 1 input --> logstash pipeline 1 queue --> logstash pipeline 1 sends ACK to filebeat --> logstash pipeline 1 filter --> logstash pipeline 1 output --> Logstash pipeline 2 input --> Logstash pipeline 2 queue --> logstash pipeline 2 sends ACK to logstash pipeline 1 --> logstash pipeline 2 filter --> logstash pipeline 2 output --> Elasticsearch --> Elasticsearch sends ACK to logstash pipeline 2
If it works this way we would need pesisted queues in every pipeline.

If I have to use persisted queues in all the pipelines, is there a way to configure the size of each queue? Some pipelines will send A LOT of data, so big queue is needed, some will send just a bit bytes so small queue is needed there.

Thanks a lot in advance!!

Badger · October 9, 2019, 2:08pm

Given that a blocked downstream pipeline will block an upstream pipeline I think it has to be the case that they need separate persistent queues. That would imply the ACK occurs as soon as the data is handed off.

nahiko2000 · October 14, 2019, 10:58am

Hi Badger, thanks for the answer,
Given that I need several Logstash queues, I have some pipelines that need a BIG queue, but some need a small one.
Is there any way to configure Logstash queues so that each queue has its own, different size from other queues? I do not find anything.

Thanks

Badger · October 14, 2019, 12:19pm

I do not think so, queue size appears to be a global option.

system · November 11, 2019, 12:19pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash persistent queues with pipelines behaivour Logstash	1	261	October 27, 2020
Persistent Queues - invitation for beta testing! Logstash	7	1465	April 13, 2017
Logstash persistent queue feature Logstash	3	1012	January 9, 2017
Are persistent queues needed with Filebeat? Logstash	3	2152	September 27, 2017
Logstash multiple pipelines failure with persistent queues Logstash	5	1402	March 7, 2022

Pipeline to pipeline distributor, how not to lose events

Related topics