ELK Active Active Setup with Failure Recovery

Hello All

The use case is that we want to setup Active Active architecture in ELK using two clusters in two different regions. We will index the event to both the local elasticsearch cluster and remote elasticsearch cluster.

To achieve failure-recovery, I wanted to use DLQ (Dead Letter Queue) feature in case Elasticsearch is down so I can store them somewhere else in the local cluster until elasticsearch is back up and reprocess those events and re-index them.

After I read through the docs, it seems this is not possible because elasticsearch has to respond with either 400 or 404 to send the event to the DLQ. Is there any other option to achieve this kind of setup with failure-recovery?

What I thought of is to replace my output stages with custom logic in filter stage to index the event and in case elasticsearch is down, I can index the event somewhere else, but I don't know if any problem would appear from that or any considerations I need to have to achieve a performant indexing as I would using output stage.

Any ideas are appreciated.

Thanks

Elastic does this in a single cluster using Cluster-level shard allocation and routing settings | Elasticsearch Guide [8.6] | Elastic

Use the region name for the "rack" value. Put a master in each space and I'd put a voting-only master in yet a third region. Put ingest and Kibana in both regions. Allocate 1 replica for each index, replica's won't be housed in the same "rack".

I did that self hosted where we had campuses in various locations state wide and it's worked for several years.

1 Like

Thanks @Len_Rugen I am not sure if this will help with our setup but I will check it.