Distribution of New Indexes Being Created

Ben_Coe · August 5, 2011, 7:35am

I have a question about ElasticSearch's behaviour.

Here's the scenario:

I have a cluster of ElasticSearch servers setup to use discovery:

discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: 0.0.0.0
tcpping:
initial_hosts: 192.0.0.1[9700],192.0.0.2[9700], 192.0.0.2[9700]

I create a new index connecting to one of the three boxes in the
cluster:

curl -XPUT 'http://192.0.0.2:9700/foobar/' -d '
index :
number_of_shards : 1
number_of_replicas : 2
'

The question:

Will any effort be made during creation to load-balance the indexes
across the three servers specified in the discovery stanza? Or, is the
index created on whatever server the PUT is performed on?

Why I ask:

I'm using ElasticSearch for a domain that requires many indexes, as
oppose to a single monolithic index. I therefore do not require
sharding as much as I require an even distribution of indexes across a
set of boxes, along with replication.

What would my best course of action be to achieve this goal, i.e:

An even distribution of indexes across a cluster of servers (with
minimal pain during the creation process).
Sane replication.
High availability. Ideally, I can connect to any node in the cluster
and retrieve a document, regardless of the server that the index
resides on.

kimchy · August 5, 2011, 7:47pm

The distribution behavior of elasticsearch is to get to a state where there
is an even number of shards across the nodes. So you should be good.

Btw, which version of elasticsearch are you using? jgroups is not there
since 0.7 or something...

On Fri, Aug 5, 2011 at 10:35 AM, Ben Coe ben@attachments.me wrote:

I have a question about Elasticsearch's behaviour.

Here's the scenario:

I have a cluster of Elasticsearch servers setup to use discovery:

discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: 0.0.0.0
tcpping:
initial_hosts: 192.0.0.1[9700],192.0.0.2[9700], 192.0.0.2[9700]

I create a new index connecting to one of the three boxes in the
cluster:

curl -XPUT 'http://192.0.0.2:9700/foobar/' -d '
index :
number_of_shards : 1
number_of_replicas : 2
'

The question:

Will any effort be made during creation to load-balance the indexes
across the three servers specified in the discovery stanza? Or, is the
index created on whatever server the PUT is performed on?

Why I ask:

I'm using Elasticsearch for a domain that requires many indexes, as
oppose to a single monolithic index. I therefore do not require
sharding as much as I require an even distribution of indexes across a
set of boxes, along with replication.

What would my best course of action be to achieve this goal, i.e:

An even distribution of indexes across a cluster of servers (with
minimal pain during the creation process).

Sane replication.

High availability. Ideally, I can connect to any node in the cluster
and retrieve a document, regardless of the server that the index
resides on.

Ben_Coe · August 6, 2011, 1:52am

I'm on 0.16.2. I guess, I found the documentation with that jgroups
example in some old documentation.

What would you recommend as the best way, out of the gate, to announce
the presence of nodes throughout a cluster?

Could you point me at some documentation, or provide an example?

On Aug 5, 12:47 pm, Shay Banon kim...@gmail.com wrote:

The distribution behavior of elasticsearch is to get to a state where there
is an even number of shards across the nodes. So you should be good.

Btw, which version of elasticsearch are you using? jgroups is not there
since 0.7 or something...

On Fri, Aug 5, 2011 at 10:35 AM, Ben Coe b...@attachments.me wrote:

I have a question about Elasticsearch's behaviour.

Here's the scenario:

I have a cluster of Elasticsearch servers setup to use discovery:

discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: 0.0.0.0
tcpping:
initial_hosts: 192.0.0.1[9700],192.0.0.2[9700], 192.0.0.2[9700]

I create a new index connecting to one of the three boxes in the
cluster:

curl -XPUT 'http://192.0.0.2:9700/foobar/'-d '
index :
number_of_shards : 1
number_of_replicas : 2
'

The question:

Will any effort be made during creation to load-balance the indexes
across the three servers specified in the discovery stanza? Or, is the
index created on whatever server the PUT is performed on?

Why I ask:

I'm using Elasticsearch for a domain that requires many indexes, as
oppose to a single monolithic index. I therefore do not require
sharding as much as I require an even distribution of indexes across a
set of boxes, along with replication.

What would my best course of action be to achieve this goal, i.e:

An even distribution of indexes across a cluster of servers (with
minimal pain during the creation process).

Sane replication.

High availability. Ideally, I can connect to any node in the cluster
and retrieve a document, regardless of the server that the index
resides on.

kimchy · August 6, 2011, 5:02pm

By default, multicast is used to perform discovery. If you want to use
unicast discovery, you can find a sample of how to configure it in the
provided configuration file:
https://github.com/elasticsearch/elasticsearch/blob/master/config/elasticsearch.yml#L28
.

On Sat, Aug 6, 2011 at 4:52 AM, Ben Coe ben@attachments.me wrote:

I'm on 0.16.2. I guess, I found the documentation with that jgroups
example in some old documentation.

What would you recommend as the best way, out of the gate, to announce
the presence of nodes throughout a cluster?

Could you point me at some documentation, or provide an example?

On Aug 5, 12:47 pm, Shay Banon kim...@gmail.com wrote:

The distribution behavior of elasticsearch is to get to a state where
there
is an even number of shards across the nodes. So you should be good.

Btw, which version of elasticsearch are you using? jgroups is not there
since 0.7 or something...

On Fri, Aug 5, 2011 at 10:35 AM, Ben Coe b...@attachments.me wrote:

I have a question about Elasticsearch's behaviour.

Here's the scenario:

I have a cluster of Elasticsearch servers setup to use discovery:

discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: 0.0.0.0
tcpping:
initial_hosts: 192.0.0.1[9700],192.0.0.2[9700], 192.0.0.2[9700]

I create a new index connecting to one of the three boxes in the
cluster:

curl -XPUT 'http://192.0.0.2:9700/foobar/'-d '
index :
number_of_shards : 1
number_of_replicas : 2
'

The question:

Will any effort be made during creation to load-balance the indexes
across the three servers specified in the discovery stanza? Or, is the
index created on whatever server the PUT is performed on?

Why I ask:

I'm using Elasticsearch for a domain that requires many indexes, as
oppose to a single monolithic index. I therefore do not require
sharding as much as I require an even distribution of indexes across a
set of boxes, along with replication.

What would my best course of action be to achieve this goal, i.e:

An even distribution of indexes across a cluster of servers (with
minimal pain during the creation process).

Sane replication.

High availability. Ideally, I can connect to any node in the cluster
and retrieve a document, regardless of the server that the index
resides on.

Topic		Replies	Views
How index distributed into ElasticSearch cluster Elasticsearch	2	1166	July 6, 2017
Elasticsearch cluster spreading the bulk tasks Elasticsearch	7	953	July 6, 2017
Recommended setup & configuration for 3 servers Elasticsearch	26	1409	July 6, 2017
Managing shard distribution Elasticsearch	7	396	July 6, 2017
Replication basics Elasticsearch	14	505	July 6, 2017

Distribution of New Indexes Being Created

Related topics