(WAN) Distributed Cluster: Confine Primary Shards and/or Master to Node(s) Nearest Source Data?

Brendon · September 8, 2014, 5:49am

Hi

We're planning on a setting up a WAN cluster with 4 nodes, each in a
different geographical location, and a single index with 5 shards (to allow
for scaling) and 4 replicas (so that each location gets a full copy of the
data). The initial data and most of the data updates (at least initially)
will come from a SQL database in the same location as one of the nodes. The
initial data and updates are being submitted via the bulk API using the
Nest client.

With reference to this:
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/distrib-write.html,
would either of the following likely improve indexing performance or is it
sufficient for the client application submitting data to connect to the
node that is nearest to it?

Ensuring that the node closest the source data is the master node.
Ensuring that all primary shards are on the node closest to the
source data - all the other nodes would get various replica shards.

We're enabling compression between nodes (transport.tcp.compress: true) and
as the WAN links are not that fast, this is expected to improve
performance. How does this setting tie in with action.bulk.compress (
https://github.com/elasticsearch/elasticsearch/issues/1850 defaults to
true).

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3859d115-5b92-49ff-89df-dba766ced4cb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · September 8, 2014, 6:02am

If you're running a cluster across those links then you need to expect
problems as ES is latency sensitive.
You may be better off looking into tribe nodes -

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 8 September 2014 15:49, Brendon bwmiszka@gmail.com wrote:

Hi

We're planning on a setting up a WAN cluster with 4 nodes, each in a
different geographical location, and a single index with 5 shards (to allow
for scaling) and 4 replicas (so that each location gets a full copy of the
data). The initial data and most of the data updates (at least initially)
will come from a SQL database in the same location as one of the nodes. The
initial data and updates are being submitted via the bulk API using the
Nest client.

With reference to this:
Elasticsearch Platform — Find real-time answers at scale | Elastic,
would either of the following likely improve indexing performance or is it
sufficient for the client application submitting data to connect to the
node that is nearest to it?

Ensuring that the node closest the source data is the master node.

Ensuring that all primary shards are on the node closest to the
source data - all the other nodes would get various replica shards.

We're enabling compression between nodes (transport.tcp.compress: true)
and as the WAN links are not that fast, this is expected to improve
performance. How does this setting tie in with action.bulk.compress (
Bulk API: Allow to control if its compressed or not using `action.bulk.compress` (defaults to true which is current behavior) · Issue #1850 · elastic/elasticsearch · GitHub defaults to
true).

Thanks

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3859d115-5b92-49ff-89df-dba766ced4cb%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3859d115-5b92-49ff-89df-dba766ced4cb%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624bUaeqZioQvqfABtgagpjt4z4U_2Uh7McoXBPjj9fus2Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.