Recently the heap utilization went close to 98% for about 2-3 hosts in the cluster which triggered GC. GC ran for quite long duration (about 300 seconds for 10-12 in an hours) and after that even heap utilization didn't come down. In the GC logs I can see that only a very less amount of memory is able to get free in each GC run. Here is how most of the logs look like,
There are zero Field-data evictions but high filter cache evictions, we have given default 10% of the heap to the filter cache.
During this event we had high usage than normal days, with terms queries (containing 6-7 fields) demanding close to 2000000 records. We also saw about ~200 MB/5 minute of data coming out as response.
We have index size in few TBs, with 2 replicas. Our write happens at 100 bulk request per second with each doc being ~10KB in size. Our reads include 3-4 level aggregations, terms query on multiple fields, rate of such query is about 15 req/sec. We have shards of size of ~15 GB, there are roughly 6 such shards on each host.
I was just wondering what all major components make up the ES heap other than field-data & filter cache which can pose a threat in such scenarios. I'm not considering filter cache as the culprit as of now because its just 10% of heap i.e. 3GB.
Field data by default doesn't do eviction, but that doesn't mean that it isn't
using memory. You should check the nodes stats API to see how much memory
fielddata is using (if any) and either set a limit or move to doc_values.
One use of the JVM heap can be memory used by Lucene itself, you can check it
out in the nodes stats API under the "segments" heading. This will tell you the
amount of memory that Lucene is using for the different parts of the index.
Another use is the in-memory FST that ES loads for the completion suggestor, you
can see how much memory is used by that in the "completion" heading in the nodes
stats.
Nodes stats is a good place to start to see what's using the memory, hopefully
it gives you a clearer picture of what is using the JVM heap.
Other than memory used by Lucene, cache on each node, you need also pay attention to terms aggregation, especially when there are multiple levels like your scenario.
For high cardinality fields, multiple level of terms aggregation could generate too many buckets which consumes a lot of memory. This is the so called combinatorial explosions problem. If that's the case, breath first collection mode could alleviate the problem. Refer to Preventing Combinatorial Explosions for detailed explanation.
Also avoid returning too many records in a single query (or deep pagination), especially when that single query hits many shards. The coordination node may suffer from combining too much data from each shard and collecting too many documents which are memory intensive.
Lastly, I was ever bothered by a couple of memory leak issues #22013#21568 which have been fixed in latest version. The bugs are likely to be triggered in a large scale cluster. It probably worth upgrading in case you are using some buggy versions.
Was checking the metric around the outage period and saw that filter cache evictions suddenly went too high for past 2-3 days before the GC triggered. After we restarted all the data nodes, filter cache evictions became too less (almost 7 times less) although filter cache quickly grew back to 800 GB.
Can high filter cache evictions cause high GC (up to 300 seconds) ? Are filter caches even as costly as field data cache, we have close to 200000000 evictions when sampled every 5 minute on a normal day and on the day of the event it was about 1,600,000,000. Our queries heavily use filters in a hope to reduce stress on the cluster. Found issue with filter cache while searching on the internet which could be the potential reason, we use ES 1.7.4.
GC was high and it was not able to free up heap much as can be seen in the GC logs above, but filter cache is only 10% of the heap i.e. 3GB I'm still not sure if it can trigger such high GCs.
I also saw, after the GC started to go high there were ERROR logs of "org.elasticsearch.indices.ttl bulk deletion failures for [2376]/[2376] items". We have TTL set for each document. Can such errors add to heap space and what could be the reason behind this kind of error, can it be because of high GC ?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.