Hello I'm working with a cluster with 6 nodes with 64 Gb of Ram and 1.7Tb of storage SSD NVMe on each node
actually all our data is actively used and we have one index per year (total 24 index), 4 of them are the most actively used and the others are only for historical, the most recent indexes have a size of 150gb =267.356.272 Docs approx and they have 5 primary shard and 3 réplicas.
what is and ideal number of shards per index assuming the size and number of documents of this indexes
i know than 1 shard (lucene) can handle 2,147,483,519 documents but i now than much people use 1 shard for each 32gb of size, and in this point what setting docs vs size i need to focus at the moments of decide how many shards assign to my indexes.
take in count than my priority is focused on reads/querying no so much writes.
what you recommend/prefer between handle one yearly index (150gb =267.356.272) or split in 12 monthly indexes, where the queries consult data from the entire year (using alias)
between 1 index (yearly) and 12 index (monthly) what is more efficient in search/fetch if the searches touch specific time ranges (2 or 3 months) in the most of cases?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.