Hi,
I've got a multi-DC setup (East Asia, EU, US).
EU is a central location where there is a cluster of 3 master nodes. All writes are performed to the EU location.
There is just one ES data node in each remote DC (US, East Asia).
In order for these remote nodes to be part of a cluster when there are network problems, I've got this in the config:
discovery.zen.fd.ping_interval: 15s
discovery.zen.fd.ping_timeout: 60s
discovery.zen.fd.ping_retries: 5
... but this is not a solution. If there is a network interruption between the master DC (EU) and remote DC the cluster just stops, it's not possible to index a document. The index latency skyrockets, client application gets on hold.
I believe that this is because the writes are synchronous. When one of the cluster nodes is unresponsive, the cluster waits for that node to respond until it gets "kicked out". But the unresponsive node won't get kicked out because of the zen.fd settings which requires 5 ping retries, each of them taking 60s, giving total of 5 minutes. That's way too long.
Is it possible to set up ES natively, so it does replicate things to remote node / cluster?
The write-availability in EU is really important here.
The read-availability in remote DC is important as well, but if data is few minutes old no one would care anyway.
If it's not possible to achieve it with the ES native setup what "out of the box" options would you recommend?
I've seen some Kafka solutions, but actually I would prefer RabbitMQ, as I don't want add more complexity to manage to my stack.
Thanks !