Cache not freed up timely on elasticsearch nodes

Issue - Alerts for low memory on elasticsearch nodes.

we have a 8 node elasticsearch cluster with 3 master eligible nodes.

Total RAM on each server is 70GB , from which 30GB is allocated to ES.
which is used 30-40% by elasticsearch most of the times. The field data
cache / filter cache are well in bounds.

The rest 40GB is available for kernel. no other process is running on the
machine.

sometime back we started getting low memory alerts on all these nodes ,
where dentry cache was taking up more then 30G and was not freeing up for
other system processes in timely manner. I understand system is using all
available RAM , which is a desired behavior but its not freeing it up
timely.

we initially thought its a system issues, so increased VFS cache pressure,
which helped a little but not much. then we added a script to drop_cache if
free memory goes below a certain limit. this workaround is working ok, but
it causes some latency when the drop_cache gets executed.

But the main reason fro posting this question here is - we have several
other applications with much higher load , but only the 2 services with
elasticsearch cluster are getting this memory issue on the servers.

Mlockall , file descriptor counts, mmap count etc are setup as per
elasticsearch documentation suggestion. we don't have a swap partition so
swapiness is set as default - 60. Garbage collection is not running very
often and no OOM for ES.

I was wondering if someone else has faced the similar issue with
elasticsearch and have a suggestion.

Thanks,

Nidhi

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e77b1351-5dcf-493c-9f47-0951dd94d64e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Nidhi,

Have you managed to solve this issue?