I have recently deployed 2 separate ES 2.4 Clusters , 2 nodes in each cluster , running on Windows 2012 R2 standard , Virtual Machines on vmware , 12 Cores , 24 GB RAM , 1 TB disk. I am using Java 1.8 , update 102.
The clusters are index heavy , meaning , they are receiving around 300-400 million entries per day. Logs are being sent from Filebeat to Logstash , which sends the logs to ES. Indexing rate is around 3000-7000 a second.
I have reduced the shard count from 5 to 2 in my index and I am creating a new index every day.
My main problem is what is the correct approach to determine the correct ES heap size ??
I started out with heap size of around 8 GB. That let the cluster run for approximately 1 day and then it ran into a long GC old , which took several minutes. And I experience error message , that master has left.
I am now at a heap size of 16 GB , which lets it run for a few days , before again being hit by a long gc old. This night , I saw a GC old taking a whopping 42 min. I see a heap usage slowly increasing over time. Like some kind of memory leak.
So then I tried to make a script to monitor the heap and when it reaches 90% or more , I will gracefully restart the node , but I dont think this is the correct approach.
I would like to think , that ES is mature enough to not suffer from this problem.
So my question , what is the correct way of sizing the heap to avoid long gc ? What other settings , should I be concerned about in a cluster being index heavy ? I have done a lot of googling ofc , but I need a little help
We would like to add more logs into the cluster and add more nodes , but I need the clusters to be more stable before this is happening .