There are 3 nodes of my ES cluster.
The shard number is 5.
There are 2 shards existing on 2 of the nodes and the last shard existing
on the last node.
I use TransportClient.addTransportAddress(...) to add all these 3 nodes.
Then a BulkRequestBuilder is created using the TransportClient.
Then index request is added to BulkRequestBuilder.
At last bulkRequest.execute().actionGet() is used to send the ES cluster.
The things I want to know is:
If I just use one of the nodes to communicate with the ES cluster, are
the indices be distributed to all the ES cluster? What's the Java class
used to do this?
How indices are distributed into the ES cluster? Is it based on the
nodes or based on the shard?
For example, if there are 3000 documents to be indexed; Then 1000
documents for each node? Or 600 documents for each shard?
What's the java class name about this policy?
Is there a good tool to manage ES cluster? Like deploying, monitoring,
upgrading, restarting, or installing new plugin.
From the website, Chef is used as an example to manage ES cluster; but
Chef is not a standard scm in my company.
There are 3 nodes of my ES cluster.
The shard number is 5.
There are 2 shards existing on 2 of the nodes and the last shard existing
on the last node.
I use TransportClient.addTransportAddress(...) to add all these 3 nodes.
Then a BulkRequestBuilder is created using the TransportClient.
Then index request is added to BulkRequestBuilder.
At last bulkRequest.execute().actionGet() is used to send the ES cluster.
The things I want to know is:
If I just use one of the nodes to communicate with the ES cluster, are
the indices be distributed to all the ES cluster?
No, by default it will automatically distribute your shards so that you get
an even number of shards per node. That's regardless of indexes, of how
many docs there are per shard, and of the performance of your nodes. In
your case with 5 shards and 3 nodes, 1 node will have to end up with just
one shard.
But you can change the way your shards are allocated:
How indices are distributed into the ES cluster? Is it based on the
nodes or based on the shard?
For example, if there are 3000 documents to be indexed; Then 1000
documents for each node? Or 600 documents for each shard?
By default, documents are distributed pretty evenly per shard - so ~600
docs/shard in your example with 5 shards. But you can add an explicit
routing value to control that. For more details and some other links, take
a look here:
Is there a good tool to manage ES cluster? Like deploying, monitoring,
upgrading, restarting, or installing new plugin.
From the website, Chef is used as an example to manage ES cluster; but
Chef is not a standard scm in my company.
For monitoring, I'd recommend our own SPM:
For managing, what is a standard scm in your company?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.