I have a 3 node Elasticsearch cluster. All nodes are data bearing nodes.
If active_primary_shards is set to 1 my data would not be distributed among the 3 nodes, is that a correct understanding? My single shard would be sitting on a single node. Wouldn't setting number of shards to 1 be defeating the purpose of having multiple nodes?
No, you could (for instance) set number_of_shards: 1 and number_of_replicas: 2 to have a copy of the shard on every node, all of which will respond to searches.
Only after I start to hit data limits for a single shard, would increasing "number of shards" make sense.
If I have 150GB of data and a 3 node cluster, where all 3 nodes are data bearing, then it would make sense to set "number of shards" to 3, would you agree?
It depends™ Specifically the details of your data and how you are indexing & searching it will affect the answer here. But as a starting point number_of_shards: 3 on a 150GB index sounds reasonable to me.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.