I am planning on the number of shards for 2 clusters. One cluster has 3 nodes while the other cluster has 5 nodes. The projected index size for both clusters have been estimated.
With regards to search performance and index performance, is it better to have numerous small sized shards or fewer big sized shards?
I have read that many shards will have system overheads while on the other hand, there is recommended size limit of a few tens of GB per shard.
I am indexing unstructured text data like research white papers and thesis papers. For text heavy data with a focus on search performance, is fewer shards better suited for that?
I am guessing fewer shards would be better suited because the volume of documents to index is not that heavy.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.