How index distributed into ElasticSearch cluster

There are 3 nodes of my ES cluster.
The shard number is 5.
There are 2 shards existing on 2 of the nodes and the last shard existing
on the last node.
I use TransportClient.addTransportAddress(...) to add all these 3 nodes.
Then a BulkRequestBuilder is created using the TransportClient.
Then index request is added to BulkRequestBuilder.
At last bulkRequest.execute().actionGet() is used to send the ES cluster.

The things I want to know is:

  1. If I just use one of the nodes to communicate with the ES cluster, are
    the indices be distributed to all the ES cluster? What's the Java class
    used to do this?

  2. How indices are distributed into the ES cluster? Is it based on the
    nodes or based on the shard?
    For example, if there are 3000 documents to be indexed; Then 1000
    documents for each node? Or 600 documents for each shard?
    What's the java class name about this policy?

  3. Is there a good tool to manage ES cluster? Like deploying, monitoring,
    upgrading, restarting, or installing new plugin.
    From the website, Chef is used as an example to manage ES cluster; but
    Chef is not a standard scm in my company.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hello,

On Thu, Jan 31, 2013 at 8:20 AM, Dean Zhang from China <
elasticbetter@gmail.com> wrote:

There are 3 nodes of my ES cluster.
The shard number is 5.
There are 2 shards existing on 2 of the nodes and the last shard existing
on the last node.
I use TransportClient.addTransportAddress(...) to add all these 3 nodes.
Then a BulkRequestBuilder is created using the TransportClient.
Then index request is added to BulkRequestBuilder.
At last bulkRequest.execute().actionGet() is used to send the ES cluster.

The things I want to know is:

  1. If I just use one of the nodes to communicate with the ES cluster, are
    the indices be distributed to all the ES cluster?

No, by default it will automatically distribute your shards so that you get
an even number of shards per node. That's regardless of indexes, of how
many docs there are per shard, and of the performance of your nodes. In
your case with 5 shards and 3 nodes, 1 node will have to end up with just
one shard.

But you can change the way your shards are allocated:
http://www.elasticsearch.org/guide/reference/index-modules/allocation.html

And you can also move shards around manually:
http://www.elasticsearch.org/guide/reference/api/admin-cluster-reroute.html

In future, there will be other algorithms available for distributing
shards. This one looks really nice:
https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/cluster/routing/allocation/allocator/BalancedShardsAllocator.java

What's the Java class used to do this?

I believe it's this one:
https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/cluster/routing/allocation/allocator/EvenShardsCountAllocator.java

  1. How indices are distributed into the ES cluster? Is it based on the
    nodes or based on the shard?
    For example, if there are 3000 documents to be indexed; Then 1000
    documents for each node? Or 600 documents for each shard?

By default, documents are distributed pretty evenly per shard - so ~600
docs/shard in your example with 5 shards. But you can add an explicit
routing value to control that. For more details and some other links, take
a look here:
http://www.elasticsearch.org/guide/appendix/glossary.html#routing

What's the java class name about this policy?

I think it's this one:
https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/cluster/routing/operation/hash/simple/SimpleHashFunction.java

  1. Is there a good tool to manage ES cluster? Like deploying, monitoring,
    upgrading, restarting, or installing new plugin.
    From the website, Chef is used as an example to manage ES cluster; but
    Chef is not a standard scm in my company.

For monitoring, I'd recommend our own SPM:

For managing, what is a standard scm in your company?

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.