Distribution of New Indexes Being Created


(Ben Coe) #1

I have a question about ElasticSearch's behaviour.

Here's the scenario:

  • I have a cluster of ElasticSearch servers setup to use discovery:

discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: 0.0.0.0
tcpping:
initial_hosts: 192.0.0.1[9700],192.0.0.2[9700], 192.0.0.2[9700]

  • I create a new index connecting to one of the three boxes in the
    cluster:

curl -XPUT 'http://192.0.0.2:9700/foobar/' -d '
index :
number_of_shards : 1
number_of_replicas : 2
'

The question:

Will any effort be made during creation to load-balance the indexes
across the three servers specified in the discovery stanza? Or, is the
index created on whatever server the PUT is performed on?

Why I ask:

I'm using ElasticSearch for a domain that requires many indexes, as
oppose to a single monolithic index. I therefore do not require
sharding as much as I require an even distribution of indexes across a
set of boxes, along with replication.

What would my best course of action be to achieve this goal, i.e:

  • An even distribution of indexes across a cluster of servers (with
    minimal pain during the creation process).
  • Sane replication.
  • High availability. Ideally, I can connect to any node in the cluster
    and retrieve a document, regardless of the server that the index
    resides on.

(Shay Banon) #2

The distribution behavior of elasticsearch is to get to a state where there
is an even number of shards across the nodes. So you should be good.

Btw, which version of elasticsearch are you using? jgroups is not there
since 0.7 or something...

On Fri, Aug 5, 2011 at 10:35 AM, Ben Coe ben@attachments.me wrote:

I have a question about ElasticSearch's behaviour.

Here's the scenario:

  • I have a cluster of ElasticSearch servers setup to use discovery:

discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: 0.0.0.0
tcpping:
initial_hosts: 192.0.0.1[9700],192.0.0.2[9700], 192.0.0.2[9700]

  • I create a new index connecting to one of the three boxes in the
    cluster:

curl -XPUT 'http://192.0.0.2:9700/foobar/' -d '
index :
number_of_shards : 1
number_of_replicas : 2
'

The question:

Will any effort be made during creation to load-balance the indexes
across the three servers specified in the discovery stanza? Or, is the
index created on whatever server the PUT is performed on?

Why I ask:

I'm using ElasticSearch for a domain that requires many indexes, as
oppose to a single monolithic index. I therefore do not require
sharding as much as I require an even distribution of indexes across a
set of boxes, along with replication.

What would my best course of action be to achieve this goal, i.e:

  • An even distribution of indexes across a cluster of servers (with
    minimal pain during the creation process).
  • Sane replication.
  • High availability. Ideally, I can connect to any node in the cluster
    and retrieve a document, regardless of the server that the index
    resides on.

(Ben Coe) #3

I'm on 0.16.2. I guess, I found the documentation with that jgroups
example in some old documentation.

What would you recommend as the best way, out of the gate, to announce
the presence of nodes throughout a cluster?

Could you point me at some documentation, or provide an example?

On Aug 5, 12:47 pm, Shay Banon kim...@gmail.com wrote:

The distribution behavior of elasticsearch is to get to a state where there
is an even number of shards across the nodes. So you should be good.

Btw, which version of elasticsearch are you using? jgroups is not there
since 0.7 or something...

On Fri, Aug 5, 2011 at 10:35 AM, Ben Coe b...@attachments.me wrote:

I have a question about ElasticSearch's behaviour.

Here's the scenario:

  • I have a cluster of ElasticSearch servers setup to use discovery:

discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: 0.0.0.0
tcpping:
initial_hosts: 192.0.0.1[9700],192.0.0.2[9700], 192.0.0.2[9700]

  • I create a new index connecting to one of the three boxes in the
    cluster:

curl -XPUT 'http://192.0.0.2:9700/foobar/'-d '
index :
number_of_shards : 1
number_of_replicas : 2
'

The question:

Will any effort be made during creation to load-balance the indexes
across the three servers specified in the discovery stanza? Or, is the
index created on whatever server the PUT is performed on?

Why I ask:

I'm using ElasticSearch for a domain that requires many indexes, as
oppose to a single monolithic index. I therefore do not require
sharding as much as I require an even distribution of indexes across a
set of boxes, along with replication.

What would my best course of action be to achieve this goal, i.e:

  • An even distribution of indexes across a cluster of servers (with
    minimal pain during the creation process).
  • Sane replication.
  • High availability. Ideally, I can connect to any node in the cluster
    and retrieve a document, regardless of the server that the index
    resides on.

(Shay Banon) #4

By default, multicast is used to perform discovery. If you want to use
unicast discovery, you can find a sample of how to configure it in the
provided configuration file:
https://github.com/elasticsearch/elasticsearch/blob/master/config/elasticsearch.yml#L28
.

On Sat, Aug 6, 2011 at 4:52 AM, Ben Coe ben@attachments.me wrote:

I'm on 0.16.2. I guess, I found the documentation with that jgroups
example in some old documentation.

What would you recommend as the best way, out of the gate, to announce
the presence of nodes throughout a cluster?

Could you point me at some documentation, or provide an example?

On Aug 5, 12:47 pm, Shay Banon kim...@gmail.com wrote:

The distribution behavior of elasticsearch is to get to a state where
there
is an even number of shards across the nodes. So you should be good.

Btw, which version of elasticsearch are you using? jgroups is not there
since 0.7 or something...

On Fri, Aug 5, 2011 at 10:35 AM, Ben Coe b...@attachments.me wrote:

I have a question about ElasticSearch's behaviour.

Here's the scenario:

  • I have a cluster of ElasticSearch servers setup to use discovery:

discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: 0.0.0.0
tcpping:
initial_hosts: 192.0.0.1[9700],192.0.0.2[9700], 192.0.0.2[9700]

  • I create a new index connecting to one of the three boxes in the
    cluster:

curl -XPUT 'http://192.0.0.2:9700/foobar/'-d '
index :
number_of_shards : 1
number_of_replicas : 2
'

The question:

Will any effort be made during creation to load-balance the indexes
across the three servers specified in the discovery stanza? Or, is the
index created on whatever server the PUT is performed on?

Why I ask:

I'm using ElasticSearch for a domain that requires many indexes, as
oppose to a single monolithic index. I therefore do not require
sharding as much as I require an even distribution of indexes across a
set of boxes, along with replication.

What would my best course of action be to achieve this goal, i.e:

  • An even distribution of indexes across a cluster of servers (with
    minimal pain during the creation process).
  • Sane replication.
  • High availability. Ideally, I can connect to any node in the cluster
    and retrieve a document, regardless of the server that the index
    resides on.

(system) #5