I have a multinode Elasticsearch setup with one master node and two data nodes. I created an index with number of shards as 2 and number of replicas as 1 and inserted the data in the index.
However I see that both the shards are present in one node and the replicas in the other node as seen below
GET /_cat/shards
enodeb 1 p STARTED 937826 125.4mb Y.YY.Y.72 data-1
enodeb 1 r STARTED 937826 124.1mb X.XX.X.73 data-2
enodeb 0 p STARTED 937263 124.7mb Y.YY.Y.72 data-1
enodeb 0 r STARTED 937263 126.2mb X.XX.X.73 data-2
I was of the assumption that shard 0 would be in one node and shard 1 would be in the other node with each replica on the other nodes. Am I missing any configuration here, as in how to distribute the shards across these two nodes?
I would appreciate if anyone could assist me on this.
Primary and replica shards hold the same data and are for most purposes equal. The primary can move over time in response to cluster events so what you are seeing is perfectly fine. You can not control on which node the primary resides.
But If I have all the shards of an index in one node, and if there are multiple users querying that index, wont the entire load fall on the same node? i thought Elasticsearch would distribute the shards across multiple nodes by default and thereby distribute the load across the cluster.
In my case I have 6 indices and all the respective shards are on the same datanode i.e data-1 with their replicas on the other node i.e data-2.
I am sorry but I think I am missing some point here in my understanding.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.