Data too large indices:data/read/search[phase/query

Hi
How I can increase such value ? for avoid any disturbance in read data over kibana

    [parent] Data too large, data for [indices:data/read/search[phase/query]] would be [4093997030/3.8gb], which is larger than the limit of [4080218931/3.7gb], real usage: [4093995640/3.8gb], new bytes reserved: [1390/1.3kb], usages [inflight_requests=1390/1.3kb, model_inference=0/0b, eql_sequence=0/0b, fielddata=36900747/35.1mb, request=0/0b]

What does this show

GET _nodes/stats/jvm/

Under the JVM section for your hot nodes....

     "jvm": {
        "timestamp": 1674514330238,
        "uptime_in_millis": 3982594145,
        "mem": {
          "heap_used_in_bytes": 1267712272,
          "heap_used_percent": 30,
          "heap_committed_in_bytes": 4202692608,
          "heap_max_in_bytes": 4202692608,
          "non_heap_used_in_bytes": 387954520,
          "non_heap_committed_in_bytes": 400424960,

stats for above query

I'd rather say that I've a problem with warm nodes for read the data over kibana

I need to understand what I'm doing wrong it's a natural that on hot tier we have much more memory due to warm tier. So after shifting these data over ILM policy we should do with these data (resize/change count of shards?) . What should be done during this process over ILM or how can I limited query from kibana (it was just simply review from discovery point tab)

on the hot tier nodes

    "wgCiK7OqTbG6XUPTtv-_gg" : {
      "timestamp" : 1674516876515,
      "name" : "es_data_ssd_2_2",
      "transport_address" : "10.0.9.14:9300",
      "host" : "10.0.9.14",
      "ip" : "10.0.9.14:9300",
      "roles" : [
        "data_content",
        "data_hot"
      ],
      "attributes" : {
        "rack_id" : "rack_two",
        "xpack.installed" : "true"
      },
      "breakers" : {
        "request" : {
          "limit_size_in_bytes" : 8160437862,
          "limit_size" : "7.5gb",
          "estimated_size_in_bytes" : 1310720,
          "estimated_size" : "1.2mb",
          "overhead" : 1.0,
          "tripped" : 0
        },
        "inflight_requests" : {
          "limit_size_in_bytes" : 8589934592,
          "limit_size" : "8gb",
          "estimated_size_in_bytes" : 376915,
          "estimated_size" : "368kb",
          "overhead" : 2.0,
          "tripped" : 0
        },
        "model_inference" : {
          "limit_size_in_bytes" : 4294967296,
          "limit_size" : "4gb",
          "estimated_size_in_bytes" : 0,
          "estimated_size" : "0b",
          "overhead" : 1.0,
          "tripped" : 0
        },
        "eql_sequence" : {
          "limit_size_in_bytes" : 4294967296,
          "limit_size" : "4gb",
          "estimated_size_in_bytes" : 0,
          "estimated_size" : "0b",
          "overhead" : 1.0,
          "tripped" : 0
        },
        "fielddata" : {
          "limit_size_in_bytes" : 3435973836,
          "limit_size" : "3.1gb",
          "estimated_size_in_bytes" : 0,
          "estimated_size" : "0b",
          "overhead" : 1.03,
          "tripped" : 0
        },
        "parent" : {
          "limit_size_in_bytes" : 8160437862,
          "limit_size" : "7.5gb",
          "estimated_size_in_bytes" : 4106377312,
          "estimated_size" : "3.8gb",
          "overhead" : 1.0,
          "tripped" : 0
        }
      }

on the warm tier nodes

    "HU8aioFzTcmPVppkZUdxlw" : {
      "timestamp" : 1674516876515,
      "name" : "es_data_hdd_3_3",
      "transport_address" : "10.0.9.31:9300",
      "host" : "10.0.9.31",
      "ip" : "10.0.9.31:9300",
      "roles" : [
        "data_warm"
      ],
      "attributes" : {
        "rack_id" : "rack_three",
        "xpack.installed" : "true"
      },
      "breakers" : {
        "request" : {
          "limit_size_in_bytes" : 4080218931,
          "limit_size" : "3.7gb",
          "estimated_size_in_bytes" : 0,
          "estimated_size" : "0b",
          "overhead" : 1.0,
          "tripped" : 0
        },
        "fielddata" : {
          "limit_size_in_bytes" : 1717986918,
          "limit_size" : "1.5gb",
          "estimated_size_in_bytes" : 0,
          "estimated_size" : "0b",
          "overhead" : 1.03,
          "tripped" : 0
        },
        "eql_sequence" : {
          "limit_size_in_bytes" : 2147483648,
          "limit_size" : "2gb",
          "estimated_size_in_bytes" : 0,
          "estimated_size" : "0b",
          "overhead" : 1.0,
          "tripped" : 0
        },
        "model_inference" : {
          "limit_size_in_bytes" : 2147483648,
          "limit_size" : "2gb",
          "estimated_size_in_bytes" : 0,
          "estimated_size" : "0b",
          "overhead" : 1.0,
          "tripped" : 0
        },
        "inflight_requests" : {
          "limit_size_in_bytes" : 4294967296,
          "limit_size" : "4gb",
          "estimated_size_in_bytes" : 8408,
          "estimated_size" : "8.2kb",
          "overhead" : 2.0,
          "tripped" : 0
        },
        "parent" : {
          "limit_size_in_bytes" : 4080218931,
          "limit_size" : "3.7gb",
          "estimated_size_in_bytes" : 1778009752,
          "estimated_size" : "1.6gb",
          "overhead" : 1.0,
          "tripped" : 0
        }
      }
    },

Hi @INS

Apologies what version are you on?

How do you know? But ok lets assume.

There can be a few reasons ... but in short you are running over your heap.

First I see on a warm node you have 4GB of Heap which is not very large especially if you have a lot of fields, indices, shards... that can cause you to run out of heap during read or write operations.

"DyMOMyeWQSSiMYHiY00GSg" : {
      "timestamp" : 1674515278756,
      "name" : "es_data_hdd_1_1",
      "transport_address" : "10.0.9.115:9300",
      "host" : "10.0.9.115",
      "ip" : "10.0.9.115:9300",
      "roles" : [
        "data_warm"
      ],
      "attributes" : {
        "rack_id" : "rack_one",
        "xpack.installed" : "true"
      },
      "jvm" : {
        "timestamp" : 1674515278756,
        "uptime_in_millis" : 660382257,
        "mem" : {
          "heap_used_in_bytes" : 3991299504,
          "heap_used_percent" : 92, <!--- Running Very Hot
          "heap_committed_in_bytes" : 4294967296,  <!--- 4GB 
          "heap_max_in_bytes" : 4294967296,
          "non_heap_used_in_bytes" : 238926808,
          "non_heap_committed_in_bytes" : 245956608,

So let's take a closer look at that node see what see, and share the results of this command plus the version of the stack.

GET /_nodes/iDyMOMyeWQSSiMYHiY00GSg/stats?metric=indices,jvm,breaker

Pls hold on, yesterday I've redeployed this cluster from scratch, so I need to wait for come back this issue . I think that it will come back soon.

@stephenb ok So I catch it the same case on the other node but right now I have a stats

{
  "_nodes" : {
    "total" : 1,
    "successful" : 0,
    "failed" : 1,
    "failures" : [
      {
        "type" : "failed_node_exception",
        "reason" : "Failed node [MZ-C2rmjTgC4C-WLXA_KFQ]",
        "node_id" : "MZ-C2rmjTgC4C-WLXA_KFQ",
        "caused_by" : {
          "type" : "circuit_breaking_exception",
          "reason" : "[parent] Data too large, data for [cluster:monitor/nodes/stats[n]] would be [4106389772/3.8gb], which is larger than the limit of [4080218931/3.7gb], real usage: [4106389280/3.8gb], new bytes reserved: [492/492b], usages [model_inference=0/0b, eql_sequence=0/0b, fielddata=103148105/98.3mb, request=0/0b, inflight_requests=492/492b]",
          "bytes_wanted" : 4106389772,
          "bytes_limit" : 4080218931,
          "durability" : "PERMANENT"
        }
      }
    ]
  },
  "cluster_name" : "elk_cluster",
  "nodes" : { }
}
{
  "_nodes" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "cluster_name" : "elk_cluster",
  "nodes" : {
    "MZ-C2rmjTgC4C-WLXA_KFQ" : {
      "timestamp" : 1675155284377,
      "name" : "es_data_hdd_5_1",
      "transport_address" : "10.0.9.214:9300",
      "host" : "10.0.9.214",
      "ip" : "10.0.9.214:9300",
      "roles" : [
        "data_warm"
      ],
      "attributes" : {
        "rack_id" : "rack_one",
        "xpack.installed" : "true"
      },
      "indices" : {
        "docs" : {
          "count" : 4408698196,
          "deleted" : 0
        },
        "shard_stats" : {
          "total_count" : 528
        },
        "store" : {
          "size_in_bytes" : 1606382858178,
          "total_data_set_size_in_bytes" : 1606382858178,
          "reserved_in_bytes" : 0
        },
        "indexing" : {
          "index_total" : 8085108,
          "index_time_in_millis" : 209702,
          "index_current" : 0,
          "index_failed" : 0,
          "delete_total" : 0,
          "delete_time_in_millis" : 0,
          "delete_current" : 0,
          "noop_update_total" : 0,
          "is_throttled" : false,
          "throttle_time_in_millis" : 0
        },
        "get" : {
          "total" : 0,
          "time_in_millis" : 0,
          "exists_total" : 0,
          "exists_time_in_millis" : 0,
          "missing_total" : 0,
          "missing_time_in_millis" : 0,
          "current" : 0
        },
        "search" : {
          "open_contexts" : 0,
          "query_total" : 7597,
          "query_time_in_millis" : 20782869,
          "query_current" : 0,
          "fetch_total" : 282,
          "fetch_time_in_millis" : 23437,
          "fetch_current" : 0,
          "scroll_total" : 75,
          "scroll_time_in_millis" : 2277788,
          "scroll_current" : 0,
          "suggest_total" : 0,
          "suggest_time_in_millis" : 0,
          "suggest_current" : 0
        },
        "merges" : {
          "current" : 0,
          "current_docs" : 0,
          "current_size_in_bytes" : 0,
          "total" : 20,
          "total_time_in_millis" : 88605,
          "total_docs" : 13483958,
          "total_size_in_bytes" : 748275470,
          "total_stopped_time_in_millis" : 0,
          "total_throttled_time_in_millis" : 19118,
          "total_auto_throttle_in_bytes" : 12811692218
        },
        "refresh" : {
          "total" : 1604,
          "total_time_in_millis" : 41281,
          "external_total" : 1337,
          "external_total_time_in_millis" : 41673,
          "listeners" : 0
        },
        "flush" : {
          "total" : 616,
          "periodic" : 616,
          "total_time_in_millis" : 2059
        },
        "warmer" : {
          "current" : 0,
          "total" : 651,
          "total_time_in_millis" : 133
        },
   "query_cache" : {
          "memory_size_in_bytes" : 54523743,
          "total_count" : 37784,
          "hit_count" : 12062,
          "miss_count" : 25722,
          "cache_size" : 503,
          "cache_count" : 503,
          "evictions" : 0
        },
        "fielddata" : {
          "memory_size_in_bytes" : 100143792,
          "evictions" : 0
        },
        "completion" : {
          "size_in_bytes" : 0
        },
        "segments" : {
          "count" : 5770,
          "memory_in_bytes" : 0,
          "terms_memory_in_bytes" : 0,
          "stored_fields_memory_in_bytes" : 0,
          "term_vectors_memory_in_bytes" : 0,
          "norms_memory_in_bytes" : 0,
          "points_memory_in_bytes" : 0,
          "doc_values_memory_in_bytes" : 0,
          "index_writer_memory_in_bytes" : 0,
          "version_map_memory_in_bytes" : 0,
          "fixed_bit_set_memory_in_bytes" : 0,
          "max_unsafe_auto_id_timestamp" : 1674774061516,
          "file_sizes" : { }
        },
        "translog" : {
          "operations" : 0,
          "size_in_bytes" : 29040,
          "uncommitted_operations" : 0,
          "uncommitted_size_in_bytes" : 29040,
          "earliest_last_modified_age" : 7494558
        },
        "request_cache" : {
          "memory_size_in_bytes" : 1765752,
          "evictions" : 0,
          "hit_count" : 3428,
          "miss_count" : 1026
        },
        "recovery" : {
          "current_as_source" : 0,
          "current_as_target" : 0,
          "throttle_time_in_millis" : 13637052
        },
        "bulk" : {
          "total_operations" : 1626,
          "total_time_in_millis" : 216141,
          "total_size_in_bytes" : 2692338928,
          "avg_time_in_millis" : 131,
          "avg_size_in_bytes" : 1623956
        }
      },
      "jvm" : {
        "timestamp" : 1675155283586,
        "uptime_in_millis" : 638731679,
        "mem" : {
          "heap_used_in_bytes" : 4052139984,
          "heap_used_percent" : 94,
          "heap_committed_in_bytes" : 4294967296,
          "heap_max_in_bytes" : 4294967296,
          "non_heap_used_in_bytes" : 237456848,
          "non_heap_committed_in_bytes" : 244252672,
          "pools" : {
            "young" : {
              "used_in_bytes" : 8388608,
              "max_in_bytes" : 0,
              "peak_used_in_bytes" : 2529165312,
              "peak_max_in_bytes" : 0
            },
            "old" : {
              "used_in_bytes" : 4043128320,
              "max_in_bytes" : 4294967296,
              "peak_used_in_bytes" : 4245606400,
              "peak_max_in_bytes" : 4294967296
            },
            "survivor" : {
              "used_in_bytes" : 623056,
              "max_in_bytes" : 0,
              "peak_used_in_bytes" : 322961408,
              "peak_max_in_bytes" : 0
            }
          }
        },
        "threads" : {
          "count" : 326,
          "peak_count" : 326
        },
        "gc" : {
          "collectors" : {
            "young" : {
              "collection_count" : 52432,
              "collection_time_in_millis" : 428935
            },
            "old" : {
              "collection_count" : 0,
              "collection_time_in_millis" : 0
            }
          }
        },
        "buffer_pools" : {
          "mapped" : {
            "count" : 13400,
            "used_in_bytes" : 776426767983,
            "total_capacity_in_bytes" : 776426767983
          },
          "direct" : {
            "count" : 368,
            "used_in_bytes" : 73475831,
            "total_capacity_in_bytes" : 73475829
          },
          "mapped - 'non-volatile memory'" : {
            "count" : 0,
            "used_in_bytes" : 0,
            "total_capacity_in_bytes" : 0
          }
        },
        "classes" : {
          "current_loaded_count" : 26176,
          "total_loaded_count" : 26699,
          "total_unloaded_count" : 523
        }
      },
      "breakers" : {
        "model_inference" : {
          "limit_size_in_bytes" : 2147483648,
          "limit_size" : "2gb",
          "estimated_size_in_bytes" : 0,
          "estimated_size" : "0b",
          "overhead" : 1.0,
          "tripped" : 0
        },
        "eql_sequence" : {
          "limit_size_in_bytes" : 2147483648,
          "limit_size" : "2gb",
          "estimated_size_in_bytes" : 0,
          "estimated_size" : "0b",
          "overhead" : 1.0,
          "tripped" : 0
        },
        "fielddata" : {
          "limit_size_in_bytes" : 1717986918,
          "limit_size" : "1.5gb",
          "estimated_size_in_bytes" : 100143792,
          "estimated_size" : "95.5mb",
          "overhead" : 1.03,
          "tripped" : 0
        },
        "request" : {
          "limit_size_in_bytes" : 4080218931,
          "limit_size" : "3.7gb",
          "estimated_size_in_bytes" : 0,
          "estimated_size" : "0b",
          "overhead" : 1.0,
          "tripped" : 0
        },
        "inflight_requests" : {
          "limit_size_in_bytes" : 4294967296,
          "limit_size" : "4gb",
          "estimated_size_in_bytes" : 246,
          "estimated_size" : "246b",
          "overhead" : 2.0,
          "tripped" : 0
        },
        "parent" : {
          "limit_size_in_bytes" : 4080218931,
          "limit_size" : "3.7gb",
          "estimated_size_in_bytes" : 4064722896,
          "estimated_size" : "3.7gb",
          "overhead" : 1.0,
          "tripped" : 424233
        }
      }
    }
  }
}

Hi @INS

You have still not provided the version you are on.... which is important....

In short, it looks like you are running out of JVM Heap.

You have a LOT of shards 500+ for 4GB Heap...

There are a number of factors that consume heap...

The number of Field Mapping, Number of Shards etc.

Give us the version perhaps we can help but a short-term fix would be to increase the JVM heap space or clean up your indices, shards, mappings...