I have a problem with coordinator node memory usage on one of our elasticsearch 5.6.3 clusters. We have 3 x coordinator nodes sitting in front of 6 data nodes. Normally, I size our coordinators with 8GB RAM and 5-6GB HEAP...
However, in this case I've been seeing oom errors and have raised heap size to 13GB and then 26GB and heap usage is sitting at over 90% most of the time. Can someone help me understand what is causing this and to fix it?
Workload is as follows:
- client search requests from kibana (kibana can't get a response from the coordinators)
- bulk indexing from multiple fluentd indexers running in 4 x kubernetes clusters
- direct searches to the api (probably low)
We're ingesting about 600 million to 1 billion log lines per day with indices in the 250GB to 350GB range. The data nodes are under load and I'm planning to add more. But the coordinator node behaviour has me confused - they're normally pretty quiet. Any help would be greatly appreciated...