We're planning on a setting up a WAN cluster with 4 nodes, each in a
different geographical location, and a single index with 5 shards (to allow
for scaling) and 4 replicas (so that each location gets a full copy of the
data). The initial data and most of the data updates (at least initially)
will come from a SQL database in the same location as one of the nodes. The
initial data and updates are being submitted via the bulk API using the
Nest client.
Ensuring that the node closest the source data is the master node.
Ensuring that all primary shards are on the node closest to the
source data - all the other nodes would get various replica shards.
We're enabling compression between nodes (transport.tcp.compress: true) and
as the WAN links are not that fast, this is expected to improve
performance. How does this setting tie in with action.bulk.compress ( https://github.com/elasticsearch/elasticsearch/issues/1850 defaults to
true).
If you're running a cluster across those links then you need to expect
problems as ES is latency sensitive.
You may be better off looking into tribe nodes -
We're planning on a setting up a WAN cluster with 4 nodes, each in a
different geographical location, and a single index with 5 shards (to allow
for scaling) and 4 replicas (so that each location gets a full copy of the
data). The initial data and most of the data updates (at least initially)
will come from a SQL database in the same location as one of the nodes. The
initial data and updates are being submitted via the bulk API using the
Nest client.
With reference to this: Elasticsearch Platform — Find real-time answers at scale | Elastic,
would either of the following likely improve indexing performance or is it
sufficient for the client application submitting data to connect to the
node that is nearest to it?
Ensuring that the node closest the source data is the master node.
Ensuring that all primary shards are on the node closest to the
source data - all the other nodes would get various replica shards.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.