Is there any ES configuration that would amount to having separate
clusters being an exact replica of each other?
The goal is that if we have exact clusters in the US, Australia and
Europe, indexing and searches in Europe never have to go to other
continents, yet they they contain the same exact data.
Perhaps ES can achieve effectively that, operationally, with still
just one cluster, using rack ids, but I am not sure.
Is there any ES configuration that would amount to having
separate
clusters being an exact replica of each other?
Cross-cluster replication is not yet built-in. We are planning
this feature but have no timeline.
The goal is that if we have exact clusters in the US, Australia
and
Europe, indexing and searches in Europe never have to go to
other
continents, yet they they contain the same exact data.
Perhaps ES can achieve effectively that, operationally, with
still
just one cluster, using rack ids, but I am not sure.
You could technically make this work with location-aware indices
and tagged nodes using the allocation API, but you would have a
similar issue with keeping the indices in sync.
It can also be problematic to introduce a lot of latency between
clustered nodes. If availability and stability is important, I
would need to test this configuration quite a while to understand
its behavior and failure modes. If you're generating ephemeral
data and outages can be tolerated, it may be sufficient.
I thought that if each primary shards and replicas are located so that
every geographical location has the entire index either in primary or
replica shards this would effectively act nearly as an independent
cluster since indexing and searches would preferably use the local
shards and use remote ones only if unable to use local ones. So the
vast majority of operations would be at the local level only, thus
achieving the main goal of having completely independent clusters.
Am I right?
Is there any ES configuration that would amount to having
separate
clusters being an exact replica of each other?
Cross-cluster replication is not yet built-in. We are planning
this feature but have no timeline.
The goal is that if we have exact clusters in the US, Australia
and
Europe, indexing and searches in Europe never have to go to
other
continents, yet they they contain the same exact data.
Perhaps ES can achieve effectively that, operationally, with
still
just one cluster, using rack ids, but I am not sure.
You could technically make this work with location-aware indices
and tagged nodes using the allocation API, but you would have a
similar issue with keeping the indices in sync.
It can also be problematic to introduce a lot of latency between
clustered nodes. If availability and stability is important, I
would need to test this configuration quite a while to understand
its behavior and failure modes. If you're generating ephemeral
data and outages can be tolerated, it may be sufficient.
I thought that if each primary shards and replicas are located
so that
every geographical location has the entire index either in
primary or
replica shards this would effectively act nearly as an
independent
cluster since indexing and searches would preferably use the
local
shards and use remote ones only if unable to use local ones.
So the
vast majority of operations would be at the local level only,
thus
achieving the main goal of having completely independent
clusters.
Am I right?
To achieve this kind of search locality you'd have to partition
your indices appropriately. In this case, it's better to maintain
isolated clusters that can each function with high availability
and index into them as you need.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.