I am looking for feedback of my conclusion. I observed number of articles
and discussions about defining number of shards per index.
*. By default Es is providing 5 shards and 1 replica for each index. I
believe, There is no particular formula to define number of shards per
index. It depends on the type of the data you are going to index in shards.
*. #of nodes >= #of shards * (#of replicas + 1) for each index.
*. For stats type of data. It is good to have 1 shards and 1 replica.
Because such type of data suppose to have so many documents but smaller
size and most probably having numbers only.
You do want to have more shards than nodes, so you can scale out easily. As
for stats type of data, a single shard assumption is not really relevant,
even small documents can amount to a lot of data.
On Sun, Jun 24, 2012 at 9:33 PM, Rimpy tarun@izap.in wrote:
I am looking for feedback of my conclusion. I observed number of articles
and discussions about defining number of shards per index.
*. By default Es is providing 5 shards and 1 replica for each index. I
believe, There is no particular formula to define number of shards per
index. It depends on the type of the data you are going to index in shards.
*. #of nodes >= #of shards * (#of replicas + 1) for each index.
*. For stats type of data. It is good to have 1 shards and 1 replica.
Because such type of data suppose to have so many documents but smaller
size and most probably having numbers only.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.