ELK Active Active Setup with Failure Recovery

mostafaelsayed · January 20, 2023, 6:29pm

Hello All

The use case is that we want to setup Active Active architecture in ELK using two clusters in two different regions. We will index the event to both the local elasticsearch cluster and remote elasticsearch cluster.

To achieve failure-recovery, I wanted to use DLQ (Dead Letter Queue) feature in case Elasticsearch is down so I can store them somewhere else in the local cluster until elasticsearch is back up and reprocess those events and re-index them.

After I read through the docs, it seems this is not possible because elasticsearch has to respond with either 400 or 404 to send the event to the DLQ. Is there any other option to achieve this kind of setup with failure-recovery?

What I thought of is to replace my output stages with custom logic in filter stage to index the event and in case elasticsearch is down, I can index the event somewhere else, but I don't know if any problem would appear from that or any considerations I need to have to achieve a performant indexing as I would using output stage.

Any ideas are appreciated.

Thanks

rugenl · January 21, 2023, 12:07am

Elastic does this in a single cluster using Cluster-level shard allocation and routing settings | Elasticsearch Guide [8.6] | Elastic

Use the region name for the "rack" value. Put a master in each space and I'd put a voting-only master in yet a third region. Put ingest and Kibana in both regions. Allocate 1 replica for each index, replica's won't be housed in the same "rack".

I did that self hosted where we had campuses in various locations state wide and it's worked for several years.

mostafaelsayed · January 21, 2023, 6:34pm

Thanks @Len_Rugen I am not sure if this will help with our setup but I will check it.

system · February 18, 2023, 6:35pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ES data replication to two ES clusters and restore Elasticsearch	8	1864	January 22, 2019
Dead letter queue with multiple outputs Logstash	1	326	July 27, 2018
Multiple Elasticsearch output errors Logstash	2	830	July 6, 2017
ElasticSearch on AWS - Disaster Recovery? Elasticsearch	7	2638	March 7, 2018
High availability multiple data center / crash recovery Elasticsearch	3	1252	July 6, 2017

ELK Active Active Setup with Failure Recovery

Related topics