I know we use quorum to know miminum master node we should have. but I'm still confused how to decide how many node we need, how many shard we need each index and when we need dedicated master nodes? is there any way to know those things? I mean is there a limit such as "we need 3 nodes for 500 GB data" ? I hope you understand my question and sorry for my broken english.
This has been asked and answered many times before, and the answer is almost always "it depends". A few rules of thumb though:
- Avoid having too big shards (<50 GB is often said).
- Since there's a fixed cost per shard, having too many shards per node isn't a good idea. Low hundreds should be okay, but you don't wants thousands.
- Ideally you'll want to have one shard per node for each index.
If you're going to have timeseries indexes (logstash-2015.08.19 etc) it won't necessarily make sense to have more than one shard since the data will be naturally sharded, so to speak. Also, if you plan on keeping the data for a long time the number of indexes will start to add up.