Elasticsearch 7.10 durability permanent

Hello team,

My Elasticsearch cluster v7.10 has the following error:

reason" : "[parent] Data too large, data for [<http_request>] would be [17110531128/15.9gb], which is larger than the limit of [16320875724/15.1gb], real usage: [17110531128/15.9gb], new bytes reserved: [0/0b], usages [request=0/0b, fielddata=0/0b, in_flight_requests=0/0b, model_inference=0/0b, accounting=669569972/638.5mb]",
    "bytes_wanted" : 17110531128,
    "bytes_limit" : 16320875724,
    "durability" : "PERMANENT"
  },
  "status" : 429
}

I haven't been able to execute a single query. This server has 8 cores and 32 Gb RAM. The heap size is set to 16g. Can anyone provide any hints on this, please? Thanks in advance.

Elasticsearch 7.10 is EOL and no longer supported. Please upgrade ASAP.

(This is an automated response from your friendly Elastic bot. Please report this post if you have any suggestions or concerns :elasticheart: )

It looks like you do not have enough heap space. What is the full output of the cluster stats API?

Thanks @Christian_Dahlqvist for your response:

Here's heap size:

heap.current heap.percent heap.max
      15.2gb           95     16gb

Also the sever is 32gb of RAM and the heap size in jvm.options is -Xms16g -Xmx16g

The cluster has more than 200 active shards and this is a single node with no replicas, does that influence?.

Here's the output of the stats API

{
  "_nodes" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "cluster_name" : "qis-ep",
  "cluster_uuid" : "JZF8_1tGRe2mN9JubP1u_A",
  "timestamp" : 1663788253589,
  "status" : "red",
  "indices" : {
    "count" : 18,
    "shards" : {
      "total" : 18,
      "primaries" : 18,
      "replication" : 0.0,
      "index" : {
        "shards" : {
          "min" : 1,
          "max" : 1,
          "avg" : 1.0
        },
        "primaries" : {
          "min" : 1,
          "max" : 1,
          "avg" : 1.0
        },
        "replication" : {
          "min" : 0.0,
          "max" : 0.0,
          "avg" : 0.0
        }
      }
    },
    "docs" : {
      "count" : 32073266,
      "deleted" : 2571
    },
    "store" : {
      "size_in_bytes" : 12300364879,
      "reserved_in_bytes" : 0
    },
    "fielddata" : {
      "memory_size_in_bytes" : 0,
      "evictions" : 0
    },
    "query_cache" : {
      "memory_size_in_bytes" : 0,
      "total_count" : 0,
      "hit_count" : 0,
      "miss_count" : 0,
      "cache_size" : 0,
      "cache_count" : 0,
      "evictions" : 0
    },
    "completion" : {
      "size_in_bytes" : 0
    },
    "segments" : {
      "count" : 253,
      "memory_in_bytes" : 207333756,
      "terms_memory_in_bytes" : 174175296,
      "stored_fields_memory_in_bytes" : 130136,
      "term_vectors_memory_in_bytes" : 0,
      "norms_memory_in_bytes" : 25046272,
      "points_memory_in_bytes" : 0,
      "doc_values_memory_in_bytes" : 7982052,
      "index_writer_memory_in_bytes" : 0,
      "version_map_memory_in_bytes" : 0,
      "fixed_bit_set_memory_in_bytes" : 296,
      "max_unsafe_auto_id_timestamp" : 1663778615179,
      "file_sizes" : { }
    },
    "mappings" : {
      "field_types" : [
        {
          "name" : "binary",
          "count" : 8,
          "index_count" : 1
        },
        {
          "name" : "boolean",
          "count" : 59,
          "index_count" : 18
        },
        {
          "name" : "date",
          "count" : 2008,
          "index_count" : 273
        },
        {
          "name" : "flattened",
          "count" : 9,
          "index_count" : 1
        },
        {
          "name" : "float",
          "count" : 2728991,
          "index_count" : 13
        },
        {
          "name" : "integer",
          "count" : 22,
          "index_count" : 5
        },
        {
          "name" : "ip",
          "count" : 10,
          "index_count" : 10
        },
        {
          "name" : "keyword",
          "count" : 372311,
          "index_count" : 277
        },
        {
          "name" : "long",
          "count" : 6966,
          "index_count" : 32
        },
        {
          "name" : "nested",
          "count" : 11,
          "index_count" : 6
        },
        {
          "name" : "object",
          "count" : 275449,
          "index_count" : 276
        },
        {
          "name" : "text",
          "count" : 372090,
          "index_count" : 277
        },
        {
          "name" : "unsigned_long",
          "count" : 1764,
          "index_count" : 12
        }
      ]
    },
    "analysis" : {
      "char_filter_types" : [ ],
      "tokenizer_types" : [ ],
      "filter_types" : [ ],
      "analyzer_types" : [ ],
      "built_in_char_filters" : [ ],
      "built_in_tokenizers" : [ ],
      "built_in_filters" : [ ],
      "built_in_analyzers" : [ ]
    }
  },
  "nodes" : {
    "count" : {
      "total" : 1,
      "coordinating_only" : 0,
      "data" : 1,
      "data_cold" : 1,
      "data_content" : 1,
      "data_hot" : 1,
      "data_warm" : 1,
      "ingest" : 1,
      "master" : 1,
      "ml" : 1,
      "remote_cluster_client" : 1,
      "transform" : 1,
      "voting_only" : 0
    },
    "versions" : [
      "7.10.2"
    ],
    "os" : {
      "available_processors" : 8,
      "allocated_processors" : 8,
      "names" : [
        {
          "name" : "Linux",
          "count" : 1
        }
      ],
      "pretty_names" : [
        {
          "pretty_name" : "Debian GNU/Linux 10 (buster)",
          "count" : 1
        }
      ],
      "mem" : {
        "total_in_bytes" : 32893620224,
        "free_in_bytes" : 3265601536,
        "used_in_bytes" : 29628018688,
        "free_percent" : 10,
        "used_percent" : 90
      }
    },
    "process" : {
      "cpu" : {
        "percent" : 66
      },
      "open_file_descriptors" : {
        "min" : 537,
        "max" : 537,
        "avg" : 537
      }
    },
    "jvm" : {
      "max_uptime_in_millis" : 49787,
      "versions" : [
        {
          "version" : "11.0.14",
          "vm_name" : "OpenJDK 64-Bit Server VM",
          "vm_version" : "11.0.14+9-post-Debian-1deb10u1",
          "vm_vendor" : "Debian",
          "bundled_jdk" : true,
          "using_bundled_jdk" : false,
          "count" : 1
        }
      ],
      "mem" : {
        "heap_used_in_bytes" : 13734673448,
        "heap_max_in_bytes" : 17179869184
      },
      "threads" : 72
    },
    "fs" : {
      "total_in_bytes" : 1055813427200,
      "free_in_bytes" : 971787694080,
      "available_in_bytes" : 918083932160
    },
    "plugins" : [ ],
    "network_types" : {
      "transport_types" : {
        "security4" : 1
      },
      "http_types" : {
        "security4" : 1
      }
    },
    "discovery_types" : {
      "single-node" : 1
    },
    "packaging_types" : [
      {
        "flavor" : "default",
        "type" : "deb",
        "count" : 1
      }
    ],
    "ingest" : {
      "number_of_pipelines" : 1,
      "processor_stats" : {
        "gsub" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 0
        },
        "script" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 0
        }
      }
    }
  }
}

Also sometimes when I restart elasticsearch for a couple of seconds I'm able to execute some queries than the heap size gets full very fast and get the 429 permanent error again.

It looks like you have 18 indices, each with a single primary shard - not 200 active shards.

These seem to take up a limited amount of memory (207MB or so). Nothing here that seems to be causing problems.

One thing that stands out is the mappings, which seem quite large for that number of indices.

What type of data is this? How are you querying it?

@Christian_Dahlqvist

I know for sure there are 200+ shards that output was from the stats API executed right after a service reboot before consuming all the heap and not being able to perform more queries. I ran the command immediately so at least I can get some info to show it to you. This is log data by the way.

@Christian_Dahlqvist

This is system information data from endpoints. The format is nested JSON. The backend server application is written in Python.

There are other indices in the elasticsearch server. There are logs, alerts, commands, batches, and devices (i.e. system info). I think the devices index is what's causing the problem due to large 3-4 MB

I don't know where to start troubleshooting this cluster. The physical RAM is 30 and the heap size is 16g but whenever I start Elasticsearch the heap gets full in less than a minute and I get the 420 durability permanent error.

I'm not sure if this will solve what you are seeing, but I know that lot of optimizations have been done since 7.10. So you should upgrade to 8.x the latest :wink:

@dadoonet Thanks so much I know that upgrading to 8.x is a good path to solve a lot of issues from 7.x but that's not an option for me right now. Is there anything else I could do instead of upgrading to solve this? Any workarounds that you know of? Thanks in advance.

I would stop all queries and indexing to the cluster and start it up. That way you should be able to get complete output from the cluster stats API. That will give us more accurate information about the state of the cluster.

@Christian_Dahlqvist Thanks for the reply. I will do that now, any stats in specific that you would like to see?.

At least upgrade to the latest 7.17.

Hey, @dadoonet I will explore that option with the team. Also to provide more context prior to the cluster entering this stage the cluster was in at least an operative state. I modified the number of the replicas on all the endpoints to be 0 instead of 1 to solve the YELLOW state of the indices and then all of this began to happen.