I am having stability issues with a modest single-node ES deployment. The node is a beefy server with 500GB RAM and 72 hyperthreads. Several other data-intensive applications run on the server, so I am aware that resource contention is a potential issue.
According to the _stats API, the "store" size is 947428199 bytes (<1GB). Currently, I have a single producer of data that uses the bulk APIs to post <10 documents per second. The producer is frequently hitting connection errors due to timeouts on the bulk API. At the moment, no search queries are being executed (so caches should not be at play).
I see plenty of warnings about GC taking too long in the Elasticsearch logs. I enabled GC logging and indeed there is frequent garbage collection that sometimes takes several seconds to complete. I have the JVM Heap Size set to 5GB (with min = max). Swap is disabled on the system.
I have attached a graph of heap usage over time. It seems that GC is quite effective at reducing the heap usage to < 50%, but the usage rises extremely quickly again.
Does anyone have any insight as to why the heap size grows so quickly, with such a small dataset?