I'm running Elasticsearch 1.5.1 and my nodes have 60GB of memory, 30GB of which are allocated (as the documentation suggests) to the Java heap.
After a full cluster restart, things are good for a while and I can see the filter cache growing as expected, until it reaches around 3GB. Evictions also start to kick in at some stage, but their amount is never particularly high and they only affect filter cache. All good.
After a few days though, nodes start to experience memory pressure, GC starts to run continuously on those nodes, taking up 2x the normal amount of CPU from the affected machines and not releasing any memory. Ultimately, the node gets to >90% heap allocation, and I have to proceed with a rolling cluster restart to release the memory pressure.
As suggested in this thread, I've proceeded to analyze a heap dump from an affected machine, and the result is interesting:
-
FixedBitSetFilterCache
- 23.6GB -
IndecesFilterCache
- 2.5GB (as reported in the cache filter size parameter) -
LocalCache$LocalManualCache
- 1.5 GB
It seems to me that, quite clearly, the problem is caused by FixedBitSetFilterCache
. Am I right?
Looking into the object, I can see that the memory occupation is dominated by LocalCache$StrongEntry
. What does this mean? How is the bitset filter cache different than index filter cache? What can I do to solve the issue?