We use Elasticsearch 1.6.0 and run two data nodes in two servers with 128G RAM and 24 Core CPU. ES java heap size is set to 30G and the index is configured to 5 shards with 1 replica.
Unlike common log files, our document is a bit complicated and average size of each document is about 500K. After bulk indexing around 12 million documents, the index size of each node in disk is about 5TB. Then ES servers become unstable. The slave sometimes loses connection with the primary. One more problem is even after old GC, the java heap occupation is still around 20G. Monitoring with Marval, I found the data of "Index Statistics->memory->LUCENE MEMORY" keeps on increasing. It is now about 30G. Does it related with Java heap occupation? What is the LUCENE MEMORY for? And anybody can suggest how to mitigate the memory issue?