I have one index which has around 10K documents which may grow
max 20K documents in few years,And i have another index which has more than 20Million documents and likely to grow more than double or triple in a year.
how to pick my shards per index , this is for 3 node cluster all being master eligible [ node.master: true and node.data:true]
for 10K document shall i have 3 shards and 1 replica and for 20M documents index should i have 5 shards and 2 replica's ? If so , when i wanted to scale horizontally and add more nodes what will be the effect? and what should be the configuration if i wanted to add let's say 3 or 4 more nodes?
You should use time based indices for reviews, given they happen at a specific time. That means you can start with a daily/weekly/monthly index with a small shard count, and easily scale.
If using Time based index , i end up having 365 or 52 or 12 indexes , having 365 indexes is it a good thing?
and if i have to search/aggregate using Transport Client using java for last 30 or 90 day's do i have to specific all the indexes while building the query? little bit newbie here , and how to create time based indexes?
Shards shouldn't exceed 50 GB (undersharding) and shouldn't be less than 5 GB (oversharding). In your case (considering that my average size for that amount of logs never exceeds 50 GB) I would go for 1 shards and 2 more replicas for taking advantage of the other two nodes.
Remember that replicas are not backups, but they can provide you HA.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.