Constant Garbage Collecting

I am getting a constant stream of garbage collection logs in my cluster.log of the type shown below:

[2021-06-16T13:33:31,031][WARN] [o.e.m.j.JvmGcMonitorService]  [hot-data.domain.com]  [gc] [255] overhead, spent [1.8s] collecting in the last [1.8s]

Possibly as a result of this, the node keeps leaving the cluster and is not receiving data. It is most likely a case of oversharding, but I am unable to reindex to reduce the shard count. Is there anything else I can do to at least "calm down" the garbage collection enough to manage the shards?

Thank you!

What is the full output of the cluster stats API? Which version of Elasticsearch are you using?

If you have older indices that are no longer written to you can reduce heap usage by forcemerging these down to 1 segment. This can however take a while and be very I/O intensive. Another option might be to close some indices to give you more headroom to address the issue properly.

I'm unable to print the full output, but I can retype a couple of sections for you if that is any help:

{
  "_nodes" : {
    "total" : 4,
    "successful" : 4,
    "failed: : 0
  },
  ...
  ...
  ...
  "indices" : {
    "count" : 961,
    "shards" : {
      "total" : 961,
      "primaries" : 961,
      "replication" : 0.0,
      "index" : {
        "shards" : {
          "min" : 1,
          "max" : 1,
          "avg" : 1.0
        },
        "replication" : {
          "min" : 0.0,
          "max" : 0.0,
          "avg" : 0.0
        }
      }
  },
  "docs" : {
    "count" : 488656241,
    "deleted" : 3664
  },
  "store" : {
    "size_in_bytes" : 317712712983,
    "reserved_in_bytes" : 0
  },
  "fielddata" : {
    "memory_size_in_bytes" : 7936,
    "evictions" : 0
  },
  "query_cache" : {
    "memory_size_in_bytes" : 1192,
    "total_count" : 715,
    "miss_count" : 710,
    "cache_size" : 9,
    "cache_count" : 10,
    "evictions" : 1
  },
  "completion" : {
    "size_in_bytes" : 0
  },
  "segments" : {
    "count" : 6895,
    "memory_in_bytes" : 326067558,
    "terms_memory_in_bytes" : 264478144,
    "stored_fields_memory_in_bytes" : 7127544,
    "term_vectors_memory_in_bytes" : 0,
    "norms_memory_in_bytes" : 38241536,
    "points_memory_in_bytes" : 0,
    "doc_values_memory_in_bytes" : 16220334,
    "index_writer_memory_in_bytes" : 13961208,
    "version_map_memory_in_bytes" : 0,
    "fixed_bit_set_memory_in_bytes" : 2296,
    "max_unsafe_auto_id_timestamp" : 1623839787128,
  },
  "mappings" : {
    "field_types" : [
    ...
    ...
    ...
    ]
  },
  "nodes" : {
    "count" : {
      "total" : 4,
      "coordinating_only" : 0,
      "data" : 1,
      "ingest" : 1,
      "master" : 3,
      "ml" : 0,
      "remote_cluster_client" : 3,
      "transform" : 0,
      "voting_only" : 0
    },
    "versions" : [
      "7.9.1"
    ],
    "os" : {
      ...
      ...
      ...
      "mem" : {
        "total_in_bytes" : 128846872576,
        "free_in_bytes" : 80813654016,
        "used_in_bytes" : 48033218560,
        "free_percent" : 63,
        "used_percent" : 37
      }
    },
    "process" : {...}
    "jvm" : {
      "max_uptime_in_millis" : 620712707,
      "versions" : [...],
      "mem" : {
        "heap_used_in_bytes" : 17336656096,
        "heap_max_in_bytes" : 31138512896
      },
      "threads" : 317
    },
...
...
..
  }
  }
}

So that's obviously not everything , but hopefully it gives you some insights for the GC issue.

Thanks for the the help!

Why do you have 3 master nodes and a single data node? It looks like you have a large number of very small shards which is inefficient. I would recommend reducing this significantly. How much heap does your data node have?

2 Likes

That's what it's seeing but it's not the case. I have 2 hot data nodes and 1 cold (on this cluster), but because of this issue they keep "leaving" the cluster. So the output you're seeing isn't correct. As to the heap, I have been following the recommendation to use half the physical memory. So on the three data nodes I upped the RAM to 48GB and set the min/max to 24GB in java.options.

-- I forgot to answer your question about the shards. I've been trying to do that, but my other issue is that I can't use the reindex API. But I have been forcemerging to at least reduce the segments. That works at least.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.