Weird memory growth (ES 7.9)

Hello, we are experiencing anomalous memory usage in a memory constrained cluster of 3 nodes running ES 7.9. We set 2GB heap size with 4 GB overall available RAM. After some days of normal operation where there are mainly update operations on existing documents we observe an unrecoverable growth of heap size causing many circuit breaker activations and logs messages like:

{"type": "server", "timestamp": "2021-01-07T17:22:41,593Z", "level": "INFO", "component": "o.e.i.b.HierarchyCircuitBreakerService", "cluster.name": "...", "node.name": "...", "message": "attempting to trigger G1GC due to high heap usage [2119996816]", "cluster.uuid": "2qNRqn54QH2hKUMshPFGVQ", "node.id": "6c0BA-2QSxK2VqiMPLsqFA"  }
{"type": "server", "timestamp": "2021-01-07T17:22:41,654Z", "level": "INFO", "component": "o.e.i.b.HierarchyCircuitBreakerService", "cluster.name": "...", "node.name": "...", "message": "GC did bring memory usage down, before [2119996816], after [2107467792], allocations [18], duration [60]", "cluster.uuid": "2qNRqn54QH2hKUMshPFGVQ", "node.id": "6c0BA-2QSxK2VqiMPLsqFA"  }

Inspecting the heap using jcmd GC.class_histogram we can see most of the memory is used by DocumentWriterDeleteQueue:

 num     #instances         #bytes  class name (module)
-------------------------------------------------------
   1:       6609561      317258928  java.util.HashMap (java.base@14.0.1)
   2:       8719989      209279736  java.util.concurrent.atomic.AtomicLong (java.base@14.0.1)
   3:       2179745      156941640  org.apache.lucene.index.DocumentsWriterDeleteQueue
   4:       1907499      154508144  [Ljava.util.HashMap$Node; (java.base@14.0.1)
   5:       2144747      124195352  [B (java.base@14.0.1)
   6:       2179746      122065776  org.apache.lucene.index.BufferedUpdates```

However cluster statistics does not account for such usage:

        "segments" : {
          "count" : 151,
          "memory_in_bytes" : 668324,
          "terms_memory_in_bytes" : 435992,
          "stored_fields_memory_in_bytes" : 100792,
          "term_vectors_memory_in_bytes" : 0,
          "norms_memory_in_bytes" : 32128,
          "points_memory_in_bytes" : 0,
          "doc_values_memory_in_bytes" : 99412,
          "index_writer_memory_in_bytes" : 2023540,
          "version_map_memory_in_bytes" : 0,
          "fixed_bit_set_memory_in_bytes" : 1016,
          "max_unsafe_auto_id_timestamp" : 1610012968298,
          "file_sizes" : { }
        }

We already tried _forcemerge (including only_expunge_deletes=true and max_num_segments=1) with no result: the memory does not decrease nor the count of deleted documents:

      "indices" : {
        "docs" : {
          "count" : 15124203,
          "deleted" : 499938
        },

Memory usage normalizes only after a node reboot.

Regards

Which exact version are you using? 7.9.0 has a memory leak so if you are using this you should upgrade to at least 7.9.1.

1 Like

Yes, thank you!