We're planning on a setting up a WAN cluster with 4 nodes, each in a
different geographical location, and a single index with 5 shards (to allow
for scaling) and 4 replicas (so that each location gets a full copy of the
data). The initial data and most of the data updates (at least initially)
will come from a SQL database in the same location as one of the nodes. The
initial data and updates are being submitted via the bulk API using the
With reference to this:
would either of the following likely improve indexing performance or is it
sufficient for the client application submitting data to connect to the
node that is nearest to it?
- Ensuring that the node closest the source data is the master node.
- Ensuring that all primary shards are on the node closest to the
source data - all the other nodes would get various replica shards.
We're enabling compression between nodes (transport.tcp.compress: true) and
as the WAN links are not that fast, this is expected to improve
performance. How does this setting tie in with action.bulk.compress (
https://github.com/elasticsearch/elasticsearch/issues/1850 defaults to
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to email@example.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3859d115-5b92-49ff-89df-dba766ced4cb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.