Getting GC overhead on Elasticsearch

Hi,

We were getting more GC on elasticsearch.

Please help me, due to this issue, ES node crashing suddenly

{
  "_nodes": {
    "total": 9,
    "successful": 9,
    "failed": 0
  },
  "cluster_name": "AWS-Pre-ProductionCluster",
  "timestamp": 1597817401045,
  "status": "green",
  "indices": {
    "count": 1397,
    "shards": {
      "total": 3228,
      "primaries": 1616,
      "replication": 0.9975247524752475,
      "index": {
        "shards": {
          "min": 2,
          "max": 8,
          "avg": 2.310665712240515
        },
        "primaries": {
          "min": 1,
          "max": 4,
          "avg": 1.1567644953471725
        },
        "replication": {
          "min": 0,
          "max": 1,
          "avg": 0.9992841803865425
        }
      }
    },
    "docs": {
      "count": 3027070521,
      "deleted": 51881
    },
    "store": {
      "size": "6.8tb",
      "size_in_bytes": 7509428714404,
      "throttle_time": "0s",
      "throttle_time_in_millis": 0
    },
    "fielddata": {
      "memory_size": "2gb",
      "memory_size_in_bytes": 2244719936,
      "evictions": 0
    },
    "query_cache": {
      "memory_size": "2.4gb",
      "memory_size_in_bytes": 2578006631,
      "total_count": 283449541,
      "hit_count": 91806355,
      "miss_count": 191643186,
      "cache_size": 2568217,
      "cache_count": 4614858,
      "evictions": 2046641
    },
    "completion": {
      "size": "0b",
      "size_in_bytes": 0
    },
    "segments": {
      "count": 50603,
      "memory": "18.6gb",
      "memory_in_bytes": 20011702483,
      "terms_memory": "16.3gb",
      "terms_memory_in_bytes": 17532072509,
      "stored_fields_memory": "1.8gb",
      "stored_fields_memory_in_bytes": 1962231688,
      "term_vectors_memory": "0b",
      "term_vectors_memory_in_bytes": 0,
      "norms_memory": "42mb",
      "norms_memory_in_bytes": 44056000,
      "points_memory": "248.3mb",
      "points_memory_in_bytes": 260397802,
      "doc_values_memory": "203mb",
      "doc_values_memory_in_bytes": 212944484,
      "index_writer_memory": "253.6mb",
      "index_writer_memory_in_bytes": 266018900,
      "version_map_memory": "7.7mb",
      "version_map_memory_in_bytes": 8087188,
      "fixed_bit_set": "0b",
      "fixed_bit_set_memory_in_bytes": 0,
      "max_unsafe_auto_id_timestamp": 9223372036854776000,
      "file_sizes": {}
    }
  },
  "nodes": {
    "count": {
      "total": 9,
      "data": 4,
      "coordinating_only": 0,
      "master": 3,
      "ingest": 9
    },
    "versions": [
      "5.6.3"
    ],
    "os": {
      "available_processors": 52,
      "allocated_processors": 52,
      "names": [
        {
          "name": "Linux",
          "count": 9
        }
      ],
      "mem": {
        "total": "401.9gb",
        "total_in_bytes": 431596474368,
        "free": "57.8gb",
        "free_in_bytes": 62121213952,
        "used": "344.1gb",
        "used_in_bytes": 369475260416,
        "free_percent": 14,
        "used_percent": 86
      }
    },
    "process": {
      "cpu": {
        "percent": 111
      },
      "open_file_descriptors": {
        "min": 347,
        "max": 2112,
        "avg": 1139
      }
    },
    "jvm": {
      "max_uptime": "76d",
      "max_uptime_in_millis": 6573622637,
      "versions": [
        {
          "version": "1.8.0_144",
          "vm_name": "Java HotSpot(TM) 64-Bit Server VM",
          "vm_version": "25.144-b01",
          "vm_vendor": "Oracle Corporation",
          "count": 9
        }
      ],
      "mem": {
        "heap_used": "89.1gb",
        "heap_used_in_bytes": 95734454912,
        "heap_max": "194.5gb",
        "heap_max_in_bytes": 208926408704
      },
      "threads": 674
    },
    "fs": {
      "total": "10.2tb",
      "total_in_bytes": 11270843568128,
      "free": "3.3tb",
      "free_in_bytes": 3737975033856,
      "available": "3.3tb",
      "available_in_bytes": 3737975033856
    },
    "plugins": [],
    "network_types": {
      "transport_types": {
        "netty4": 9
      },
      "http_types": {
        "netty4": 9
      }
    }
  }
}

Are you really on 5.6.3?
How much heap do each of your nodes have assigned to them?

Your shards are very small for the data you hold, and you are wasting resources on maintaining them. This is likely the main cause, as you have ~800 shards per node.

Dear Warkolm,

Thanks for your reply,

Yes, we were using 5.6.3 version.

Please suggest us, what is the maximum shard limit per node?
What is the recommended hardware for (Master/Ingest/Data node)?
How to tune this cluster?

i have to change the node role from ingest to coordinator??

image

You really need to upgrade urgently, 5.X has been EOL for quite some time now. Upgrading will also bring performance improvements to help with this.

You also need to reduce your shard count. See How do I increase or reduce the shard count of an existing index? for some ideas.

Dear Warkolm,

can you please recommend the hardware for node wise??

If you are going to have dedicated master nodes these should ideally not serve traffic and therefore not also be ingest nodes. This means that they probably can also be one size smaller than what you have currently got. It may also be useful to avoid having the data nodes also be ingest nodes as this hopefully will reduce heap pressure. You could instead perhaps add one additional ingest/coordinating node (if needed) and send all indexing and query requests through these.

I also agree with the recommendations to upgrade and reduce the shard count.

Dear Christian,

Therefore, I have to change the existing node role??

Like master/ingest to only master?

And datanode/ingest to only datanode?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.