Hi, we use ES 1.4.4. Our Cluster consists of 10 DataNodes within 16 GB Ram total and 10 GB HEAP.
We have 174 indices and 1732 shard (5 primary, 5 replica per index) [we made a new index every month]
Overall we have 818,876,914 docs (replicas are not taken into account) - but we want to expand the system and then save in the future 10 billion docs per year...
We have 3 master nodes, 2 client nodes and 3 proxy clients and use loadbalancing.
Our heap usage increases rapidly and also the retrofitting of Ram || increase the HEAP does not bring long-term improvements more.
Now therefore I ask for advice how to reduce HEAP utilization.
1.) increases the use of the heap linearly with number of docs in the Node / Index / shard?
2.) I have heard the rule: per 100 million docs needed 5-10 GB memory - is this right?
3.) When we generate in the future 10 billion docs per year:
Is this practicable//possible with es [and a not too large number of datanodes] ?
4.) What are the components/functions of the ES-HEAP usage in general? i mean: which functions/services of ES generally uses the HEAP
5.) which are measures/actions to keep the HEAP usage generally lower?
6.) Would you selectively disable / deactivate specific caches to lower the HEAP-usage
7.) Would you hold the caches persistent?
8.) Are there any best-practice settings for the garbage collector?
THX for your help