Ever since we moved to keeping all indices open on our ES 1.7 ELK cluster we've had a large increase in GC runs.
We have 1000+ indices, mostly with 3 shards
16408 segments in total
Index buffer is configured to 20%
Field data cache fixed to 2GB (very questionable setting, we know)
Shards get optimized to 2 segments after 1 day
This is on 14 machines running 14 instances of ES with 30GB heap
It seems that most memory is consumed by memory related to segments.
Output from 1 node:
The mappings we use are fairly standard with a fixed number of fields. We assume disabling norms will help us a bit - is that worth considering? Anything else we could look at?
Can't verify using the API 'cause we're using 1.7. Unless ofcourse the *.tim files on disk are fully loaded in the heap, then I could make a calculation.
Would you mind verifying that on your cluster (du *.tim == terms_memory_in_bytes)?
Fields: anywhere between 20 and 60.
Docs in index: anywhere between a few and 250 million
These stats are fairly broad due to us splitting various types of logging into separate indices (for various reasons).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.