I would like some help with memory usage issues I've encountered when loading a large number of percolators into an Elasticsearch index. I'm using Elasticsearch 2.3.5.
I have a program which is designed to create 1.5 million+ percolators and I have found that after it has been running for a while, ES becomes unresponsive, and the program is never able to run to completion. Looking at the cluster stats in Marvel shows that JVM heap usage on individual nodes increases in a linear fashion as more percolators are indexed.
The usual threshold for JVM garbage collection is around 70% but memory usage increases beyond this point and I never see it drop back down to the original level after indexing the percolators. The example above shows memory usage staying at over 80% after indexing around 300,000 percolators on a cluster with 12GB of memory in total.
If my understanding that percolators reside in memory is correct, and my cluster with 12GB of RAM is hitting a wall at 300,000 percolators, I'm guessing that I will need to increase the total RAM by at least 5 times in order to index 1.5 million percolators. I have looked at the structure of the percolator queries being generated and tried to remove all but the essential clauses.
So am I correct in thinking that my only way out of this problem is to throw more memory at the cluster, either by adding more nodes, or adding RAM to the existing nodes?
Any advice would be appreciated, thanks.