I have one index which has around 10K documents which may grow
max 20K documents in few years,And i have another index which has more than 20Million documents and likely to grow more than double or triple in a year.
how to pick my shards per index , this is for 3 node cluster all being master eligible [ node.master: true and node.data:true]
for 10K document shall i have 3 shards and 1 replica and for 20M documents index should i have 5 shards and 2 replica's ? If so , when i wanted to scale horizontally and add more nodes what will be the effect? and what should be the configuration if i wanted to add let's say 3 or 4 more nodes?
If using Time based index , i end up having 365 or 52 or 12 indexes , having 365 indexes is it a good thing?
and if i have to search/aggregate using Transport Client using java for last 30 or 90 day's do i have to specific all the indexes while building the query? little bit newbie here , and how to create time based indexes?
Shards shouldn't exceed 50 GB (undersharding) and shouldn't be less than 5 GB (oversharding). In your case (considering that my average size for that amount of logs never exceeds 50 GB) I would go for 1 shards and 2 more replicas for taking advantage of the other two nodes.
Remember that replicas are not backups, but they can provide you HA.