Glusterfs - Index Replication across sites

In order to have a full active-active cross DC setup, what would be the issues of storing the index in a glusterfs replicated volume.
Each DC would read and write to its own local replica and GlusterFS would handle the replication.
Does it sound feasible or insane?

I don't know the particulars of GlusterFS, but if a write/read needs to traverse a WAN link, ES performance will suffer.

Mark, you're right.

But in this case, the reads would be local. Only writes would traverse WAN link.
In my case, the writes will happen sporadically (once or twice a day), it is basically a CMS that only changes its index when content gets changed and published.

Are the writes synchronous?

They would have to be!

So you have one distributed + synchronous write based system, sitting on top of another.
Your writes are going to be very slow and may even cause timeouts.

You'd need to test.

What about the cluster state information that can be found in the data directories? I would expect at least that to cause conflicts if accessed by multiple clusters. If you are indexing/updating rarely, why not just have 2 separate clusters and replicate indices using snapshot and restore?

1 Like

It must be an active-active scenario, meaning that when a change happens, it must be applied on both sites at the same time.

Given the nature of distributed systems, that's technically not possible.
So how close to same time do you need to be, can you have N seconds difference?

Why can you not have two separate clusters, with your update process writing to both at the same time?

In that case the recommended architecture is to distribute the updates separately to the two clusters, e.g. by using a message queue.

So something like Kafka described over here: Clustering Across Multiple Data Centers | Elastic Blog

Sounds promising, I just would have to consider failover for this message broker as well.