Oct 29 16:25:48 cms-zulu-datastore-none-1551817 kernel: [103329.931375] Out of memory: Kill process 18823 (java) score 830 or sacrifice child
Oct 29 16:25:48 cms-zulu-datastore-none-1551817 kernel: [103329.932556] Killed process 18823 (java) total-vm:241920696kB, anon-rss:18411644kB, file-rss:26660112kB, shmem-rss:0kB
The system killed the ES process because it was using too much memory. You'll have to either reduce the amount of memory used by ES (setting smaller heap size via jvm.options), free up memory on the system by other means or (relatively unlikely to be broken unless you made changes to it) fix your system's settings to make the OOM killer less trigger happy.
This is however not necessarily going to fix the out of memory killer killing the ES process. Given that your system has significantly more memory available than outright needed to run ES with a 16GB heap you could try fixing the issue by allowing the system to allocate more aggressively via
echo 1 > /proc/sys/vm/overcommit_memory
EDIT:
One other thing you should look into is this line:
Oct 29 16:25:48 cms-zulu-datastore-none-1551817 kernel: [103314.853172] INFO: task java:29057 blocked for more than 120 seconds.
It seems your java process became completely blocked on "something" here. That "something" is most likely disk IO. Could there be a problem with that such that disk IO resources are temporarily almost completely exhausted? What kind of storage hardware do you have backing your cluster?
Yes, unrelated to this problem it's still a good idea to turn down swappiness even with mlock in place as mlock does not extend to ES' off-heap memory use.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.