We are running a 3 node cluster, 25GB each, index size is 3TB at the moment.
Only running indexing, no searches have been made and heap usage moves between 75%-95%.
We are running two parallel indexers, each sending bulks of 1-10000 docs of a few bytes to a few kb each.
Are 3 nodes with 25GB each not enough for 3TB of data?
What is ES using all this memory for? I checked, and fielddata stands at 0 (no searches were made).
One of the nodes is always using 10% more heap than the rest, what could cause this?
ES version 1.7.3, previously were using 1.5.2 and getting the same behavior.
10,000 docs is unlikely to squeeze into a few KB even if the docs are only 10 bytes each.
The real question is "are you seeing full GCs?" Elasticsearch should have a stat for that you can check next to the memory stats. If you aren't seeing many full GCs then you have nothing to worry about. Just ignore the memory usage - it'll get cleaned up eventually and it'll be done in the background.
BTW, I inspected one of the core dumps a few days ago, when the cluster had 20gb heap per node. Out of 20gb, 13gb was used by byte arrays - does it make sense? Is there a way to find out what is in these 13gb?
We've been able to resolve the issue with some help from an ES professional.
Most of the memory was used by the segments area which must be the doc values, and there is not much which can be done in this regard but to add more nodes/memory.
We had some aggressive ngram mappings so we reduced/removed most of them and this reduced memory and disk usage significantly.