Cross cluster replication

Exploring CCR.

Flow - fluentd---->Load balancer--'-->elasticsearch ccr<---kibana.

Cluster A and Cluster B are set up with CCR. As per the documentation-An index has many shards, the follower replicas are in another cluster than the leader.

My question is if Cluster A has a primary index and Cluster A is down can fluentd still write to a new shard for the same index on Cluster B ?

Aa each Index can have 5 primary shards per index so i am assuming that Elasticsearch can automatically create a new primary shard and then fluentd can continue to write to elasticsearch?

Or is it that all primary index only exists in Cluster A? So if cluster A is down fluentd cannot write to that index as you cannot write to a follower shard?however a user using kibana can still search that index?

If we cannot write than how can this be considered HA?

The leader index in cluster A has both primaries and replicas and can be written to. If you use CCR to replicate this to cluster B the follower index in cluster B will also have primary and replica shards but be read-only.

No.

No, that is not the case.

For immutable data CCR is often set up as bi-directional using two separate indices and you write to a cluster specific index. If you want to write to a single index name I wonder if you might be able to use a ingest pipeline per index that change the index name to the appropriate cluster specific name.

Thanks for your reply.

Regarding your response " For immutable data CCR is often using two separate indices and you write to a cluster specific index"

I think that could be a possible solution...

From the start fluentd/ elasticsearch can be configured to write to either cluster specific index. Assuming round robin.
For example cluster A indices is Log03072022.1and then on Cluster B Log03072022.2. Both are part of an index pattern log*. If cluster A is down then fluentd continues to write to a Load balancer who is now only pointing to cluster B. Hence the HA is achieved??

I was thinking the following. Will use sample names, so please change this for som ething better if you try it out. Set up fluentd to index into the logdata alias/index. This can round robin across both clusters or have one as preference and only failover if required.

In cluster A set up a logdata_A data stream that you replicate to cluster B using CCR. In cluster B set up a logdata_B data stream that you replicate to cluster A using CCR.
As you want to write to the same index name on both cluster you will need a way to redirect writes. I have not tried the following suggestions so am not sure they will work or not.

  • Create a local alias logdata in each cluster and pouint this to the local data stream.
  • Create a local dummy index named logdata in each cluster and associate this with a default ingest pipeline that changes the index name of all documents written to it to the local data stream.

You can then create an index pattern logdata_* to access all these indices in one or the cluster.
I suspect you should be able to now create an index named

2 Likes

Thanks again for your reply. Yes i think the alias is a better option will have to now test the theory:) merci

@Nicole-Ann_Menezes Here are some pictures of what @Christian_Dahlqvist was describing...in short in each cluster you write to the Leader and read from both the Leader and Follower... These are just high level obviously there are many details but hope this helps a bit.

Here are a couple architecture diagrams to think about... The first is more what you two are describing...

And a Failed Cluster Scenario

And a different more local architecture as well...

Then from there you can work on your failure scenarios ... like the one you mentioned...

3 Likes

@stephenb thanks this really will help. I have a meeting with an elasticsearch rep on the 5th july to discuss licensing options as we want to run these on K8S. Hoping all lines up. :slight_smile:

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.