Question 1
At first I was going to have 6 nodes (31g heap each) and 6 shards, it looked clean to me.
But now that I think of 1TB/6 = 170GB per shard... Isn't it too much for a single shard? Or mby it is OK since all nodes sit on the same physical server?
In case I obey the "50Gb at max" recommendation it would be 24 shards at least to split evenly across 6 nodes. Isn't 24 shards too much?
Question 2
I know nodes can work as master / data / master+data, and it is desirable for master nodes to do no extra work besides being a master.
Since all my nodes gonna sit on the same physical server, which setup should I use?
Currently I just run 6 nodes in default configuration (= master+data).
No, 24 shards is not a lot, m although the ideal shard size and count tend to vary depending on use case. Query performance will depend on shard size and multiple shards can be processed in parallel, so may very well give better performance.
As you have a single large server you are not going to have a HA cluster anyway, so you could probably let all your nodes have the default configuration (master+data) and set minimum master nodes to 4. This would allow you to do rolling upgrades.
You could also set up small dedicated master nodes on the server, but whether this is beneficial or not depends on what load you will be putting on the cluster.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.