I am currently sizing my production cluster and had some questions. PLease help me out with some pointers. Based on my research shard size should not exceed more than 50 GB to perform optimally. Below is my scenario
- I have 5 nodes with SSD 100 GB each node and 16GB RAM each node
- We will have about 450GB of logs to be processed each month
- We dont want to store these documents for long as the raw files will available in some cold storage if we ever need them and can take it through adhoc indexing if needed
Based on these criteria, I am thinking of below
- Create an index with 10 shards and 1 replica -- this will mean i need to have minimum 1TB storage (450 * 2) correct ?
- Allow 8gb RAM for heap on each node making the total heap size to be 40GB -- is this good enough or will create problem for GC ?
- Create ILM policy to rollover after 1month and delete the old index
Please let me know if these are good enough to begin with or are there other things that I need to consider ?