High availability Ingest architecture

fer.mt · March 12, 2026, 1:23pm

Context:

My team manages an on-prem Elastic deployment. We have an ECE license.
We had an outage a few weeks ago when our elastic nodes ran out of space caused by a problem with the move of data to the frozen layer.

Since then we have recovered the services but we lost data that couldn’t be ingested during the outage.

We are exploring enhancements to the current architecture and one of the first things that we came up with is the deployment of different logstash nodes with Persistent queues (PQ) enabled.

Nowadays, is that an enough enhancement to ensure that during an outage that takes hours to fix we don’t loose any data that the different data sources are sending to our Elastic deployment(Giving that the logstash nodes are still working during the outage)? Or do we need to introduce other elements like Kafka or Redis for this requirement?

Replaying events that were not processed by elasticsearch during the outage is a must.

stephenb · March 12, 2026, 1:32pm

Without lots of detail...
The tool of choice for the use case you're describing is often Kafka
We have many large scale durable architectures that are combination Logstash plus Kafka.

PQs could be a solution, it's kind of a smaller, perhaps less scalable more tightly coupled solution.

You'll have to consider some of your own trade-offs. Like do you want to manage another technology versus decoupling and probably a bit better durability?.

That's my thoughts. I'm sure others will have theirs.

leandrojmp · March 12, 2026, 6:29pm

I use Logstash + Kafka and recommend this as a good approach, I would not recommend PQ.

The problem with PQ is that they will require that your Logstash instances also have big and faster disks, which makes them expensive, every event will need to be written into disk then read from it before being processed, this also may increase the requirements for CPU in the logstash instances and it is something that cannot be done in parallel, which may impact in your ingestion rate.

Before 9.2 PQ also weren't able to compress the data, which used a lot more disk, but now it seems that you can compress the data in the PQ.

Kafka is widely used as a message buffer for ingestion flows.

How are you indexing your data? Describing your ingestion flows would make easier to provide more feedback.

Topic		Replies	Views
Queueing LogEvents in Redis or similar needed? Logstash	6	744	January 17, 2018
Persistent queue and kafka Logstash	0	321	January 29, 2021
Logsstash and apache kafka Logstash	3	1165	June 18, 2022
Use Logstash persistent queue, or use Redis/Kafka/RabbitMQ? (On a server that's offline most of the time) Logstash	1	851	March 28, 2020
A question on Persistent Queues Logstash	0	287	February 4, 2020

High availability Ingest architecture

Related topics