Hello, we are experiencing anomalous memory usage in a memory constrained cluster of 3 nodes running ES 7.9. We set 2GB heap size with 4 GB overall available RAM. After some days of normal operation where there are mainly update operations on existing documents we observe an unrecoverable growth of heap size causing many circuit breaker activations and logs messages like:
{"type": "server", "timestamp": "2021-01-07T17:22:41,593Z", "level": "INFO", "component": "o.e.i.b.HierarchyCircuitBreakerService", "cluster.name": "...", "node.name": "...", "message": "attempting to trigger G1GC due to high heap usage [2119996816]", "cluster.uuid": "2qNRqn54QH2hKUMshPFGVQ", "node.id": "6c0BA-2QSxK2VqiMPLsqFA" }
{"type": "server", "timestamp": "2021-01-07T17:22:41,654Z", "level": "INFO", "component": "o.e.i.b.HierarchyCircuitBreakerService", "cluster.name": "...", "node.name": "...", "message": "GC did bring memory usage down, before [2119996816], after [2107467792], allocations [18], duration [60]", "cluster.uuid": "2qNRqn54QH2hKUMshPFGVQ", "node.id": "6c0BA-2QSxK2VqiMPLsqFA" }
Inspecting the heap using jcmd GC.class_histogram
we can see most of the memory is used by DocumentWriterDeleteQueue
:
num #instances #bytes class name (module)
-------------------------------------------------------
1: 6609561 317258928 java.util.HashMap (java.base@14.0.1)
2: 8719989 209279736 java.util.concurrent.atomic.AtomicLong (java.base@14.0.1)
3: 2179745 156941640 org.apache.lucene.index.DocumentsWriterDeleteQueue
4: 1907499 154508144 [Ljava.util.HashMap$Node; (java.base@14.0.1)
5: 2144747 124195352 [B (java.base@14.0.1)
6: 2179746 122065776 org.apache.lucene.index.BufferedUpdates```
However cluster statistics does not account for such usage:
"segments" : {
"count" : 151,
"memory_in_bytes" : 668324,
"terms_memory_in_bytes" : 435992,
"stored_fields_memory_in_bytes" : 100792,
"term_vectors_memory_in_bytes" : 0,
"norms_memory_in_bytes" : 32128,
"points_memory_in_bytes" : 0,
"doc_values_memory_in_bytes" : 99412,
"index_writer_memory_in_bytes" : 2023540,
"version_map_memory_in_bytes" : 0,
"fixed_bit_set_memory_in_bytes" : 1016,
"max_unsafe_auto_id_timestamp" : 1610012968298,
"file_sizes" : { }
}
We already tried _forcemerge
(including only_expunge_deletes=true
and max_num_segments=1
) with no result: the memory does not decrease nor the count of deleted documents:
"indices" : {
"docs" : {
"count" : 15124203,
"deleted" : 499938
},
Memory usage normalizes only after a node reboot.
Regards