Data too large, data for [<transport_request>]

Do not open replicas, index write normal, open replicas, write the following error:

type":"circuit_breaking_exception","reason":"[parent] Data too large, data for [<transport_request>] would be [30770989726/28.6gb], which is larger than the limit of [30369601945/28.2gb], real usage: [30770988776/28.6gb], new bytes reserved: [950/950b], usages [request=0/0b, fielddata=22478/21.9kb, in_flight_requests=9844502/9.3mb, accounting=692585018/660.5mb]","bytes_wanted":30770989726,"bytes_limit":30369601945,"

elasticsearch version:7.3.1
jvm sizes: 30G

Welcome to our community! :smiley:

Are you asking a question here or just posting an error? If you are asking a question it'd help to add more information about what you are doing when you see this, what Elasticsearch version you are using and things like that.

What is the full output of the cluster stats API?

[2020-12-15T10:43:49,733][DEBUG][o.e.a.s.TransportSearchAction] [cli-16.213-1] All shards failed for phase: [query]
org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [30449860338/28.3gb], which is larger than t
he limit of [30369601945/28.2gb], real usage: [30449859400/28.3gb], new bytes reserved: [938/938b], usages [request=98304/96kb, fielddata=2002/1.9kb, in_flight_requests
=49862914/47.5mb, accounting=561555100/535.5mb]
        at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:342) ~[elasticsearch-7.3.1.jar:7.3.1]
        at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:128) ~[elasticsearch-7.3.1.jar:7.3.1]
        at org.elasticsearch.transport.InboundHandler.handleRequest(InboundHandler.java:173) [elasticsearch-7.3.1.jar:7.3.1]
        at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:121) [elasticsearch-7.3.1.jar:7.3.1]
        at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:105) [elasticsearch-7.3.1.jar:7.3.1]
        at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:660) [elasticsearch-7.3.1.jar:7.3.1]
        at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:62) [transport-netty4-client-7.3.1.jar:7.3.1]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323) [netty-codec-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297) [netty-codec-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) [netty-handler-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1478) [netty-handler-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1227) [netty-handler-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1274) [netty-handler-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502) [netty-codec-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:441) [netty-codec-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278) [netty-codec-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1408) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:930) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:682) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:582) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:536) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906) [netty-common-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.36.Final.jar:4.1.36.Final]
        at java.lang.Thread.run(Thread.java:835) [?:?]

Please read what we are asking for and provide it as it will help us help you.
Please don't just post unformatted logs, you're unlikely to get much response if you keep doing that.

{
  "_nodes" : {
    "total" : 252,
    "successful" : 252,
    "failed" : 0
  },
  "cluster_name" : "es_label_103",
  "cluster_uuid" : "QeG5AVVjQI-s5s8j_Xyl-w",
  "timestamp" : 1608015011933,
  "status" : "yellow",
  "indices" : {
    "count" : 183,
    "shards" : {
      "total" : 33169,
      "primaries" : 19569,
      "replication" : 0.6949767489396494,
      "index" : {
        "shards" : {
          "min" : 2,
          "max" : 900,
          "avg" : 181.2513661202186
        },
        "primaries" : {
          "min" : 1,
          "max" : 450,
          "avg" : 106.93442622950819
        },
        "replication" : {
          "min" : 0.0,
          "max" : 1.0,
          "avg" : 0.6338069216757741
        }
      }
    },
    "docs" : {
      "count" : 2403429400234,
      "deleted" : 86934631234
    },
    "store" : {
      "size_in_bytes" : 629596270835385
    },
    "fielddata" : {
      "memory_size_in_bytes" : 1531968,
      "evictions" : 0
    },
    "query_cache" : {
      "memory_size_in_bytes" : 208883396131,
      "total_count" : 7875663560,
      "hit_count" : 698624036,
      "miss_count" : 7177039524,
      "cache_size" : 52008,
      "cache_count" : 906539,
      "evictions" : 854531
    },
    "completion" : {
      "size_in_bytes" : 0
    },
    "segments" : {
      "count" : 137549,
      "memory_in_bytes" : 132931036836,
      "terms_memory_in_bytes" : 26049019168,
      "stored_fields_memory_in_bytes" : 59368704064,
      "term_vectors_memory_in_bytes" : 0,
      "norms_memory_in_bytes" : 43804736,
      "points_memory_in_bytes" : 40808637406,
      "doc_values_memory_in_bytes" : 6660871462,
      "index_writer_memory_in_bytes" : 11222761790,
      "version_map_memory_in_bytes" : 297421015,
      "fixed_bit_set_memory_in_bytes" : 53580812632,
      "max_unsafe_auto_id_timestamp" : 1607990411680,
      "file_sizes" : { }
    }
  },
  "nodes" : {
    "count" : {
      "total" : 252,
      "coordinating_only" : 0,
      "data" : 240,
      "ingest" : 252,
      "master" : 3,
      "voting_only" : 0
    },
    "versions" : [
      "7.3.1"
    ],
    "os" : {
      "available_processors" : 12048,
      "allocated_processors" : 12048,
      "names" : [
        {
          "name" : "Linux",
          "count" : 252
        }
      ],
      "pretty_names" : [
        {
          "pretty_name" : "Red Hat Enterprise Linux Server 7.4 (Maipo)",
          "count" : 252
        }
      ],
      "mem" : {
        "total_in_bytes" : 69628506759168,
        "free_in_bytes" : 3035193581568,
        "used_in_bytes" : 66593313177600,
        "free_percent" : 4,
        "used_percent" : 96
      }
    },
    "process" : {
      "cpu" : {
        "percent" : 232
      },
      "open_file_descriptors" : {
        "min" : 6719,
        "max" : 10788,
        "avg" : 7459
      }
    },
    "jvm" : {
      "max_uptime_in_millis" : 11212886978,
      "versions" : [
        {
          "version" : "12.0.2",
          "vm_name" : "OpenJDK 64-Bit Server VM",
          "vm_version" : "12.0.2+10",
          "vm_vendor" : "Oracle Corporation",
          "bundled_jdk" : true,
          "using_bundled_jdk" : true,
          "count" : 252
        }
      ],
      "mem" : {
        "heap_used_in_bytes" : 3520870349160,
        "heap_max_in_bytes" : 8045207224320
      },
      "threads" : 96830
    },
    "fs" : {
      "total_in_bytes" : 1194655542632448,
      "free_in_bytes" : 868724110172160,
      "available_in_bytes" : 808506696572928
    },
    "plugins" : [
      {
        "name" : "analysis-pinyin",
        "version" : "7.3.1",
        "elasticsearch_version" : "7.3.1",
        "java_version" : "1.8",
        "description" : "Pinyin Analysis for Elasticsearch",
        "classname" : "org.elasticsearch.plugin.analysis.pinyin.AnalysisPinyinPlugin",
        "extended_plugins" : [ ],
        "has_native_controller" : false
      },
      {
        "name" : "repository-hdfs",
        "version" : "7.3.1",
        "elasticsearch_version" : "7.3.1",
        "java_version" : "1.8",
        "description" : "The HDFS repository plugin adds support for Hadoop Distributed File-System (HDFS) repositories.",
        "classname" : "org.elasticsearch.repositories.hdfs.HdfsPlugin",
        "extended_plugins" : [ ],
        "has_native_controller" : false
      },
      {
        "name" : "analysis-ik",
        "version" : "7.3.1",
        "elasticsearch_version" : "7.3.1",
        "java_version" : "1.8",
        "description" : "IK Analyzer for Elasticsearch",
        "classname" : "org.elasticsearch.plugin.analysis.ik.AnalysisIkPlugin",
        "extended_plugins" : [ ],
        "has_native_controller" : false
      }
    ],
    "network_types" : {
      "transport_types" : {
        "security4" : 252
      },
      "http_types" : {
        "security4" : 252
      }
    },
    "discovery_types" : {
      "zen" : 252
    },
    "packaging_types" : [
      {
        "flavor" : "default",
        "type" : "tar",
        "count" : 252
      }
    ]
  }
}

The first time I asked a question, I was a little unskillful。

GET _cluster/stats ?

OK, that is a large cluster with a lot of data. How is the cluster configured? Do you have any non-default settings, e.g. related to recovery?

"transient" : {
    "cluster" : {
      "routing" : {
        "allocation" : {
          "cluster_concurrent_rebalance" : "2",
          "node_concurrent_recoveries" : "4",
          "enable" : "all"
        }
      }
    },
    "indices" : {
      "recovery" : {
        "max_bytes_per_sec" : "200mb"
      }
    }
  }

search.max_buckets: 200000
thread_pool.write.queue_size: 3000
discovery.zen.fd.ping_timeout: 120s
discovery.zen.fd.ping_retries: 6
discovery.zen.fd.ping_interval: 30s
cluster.routing.allocation.same_shard.host: true
transport.tcp.compress: true
http.max_content_length: 2000mb
bootstrap.memory_lock: true
indices.fielddata.cache.size: 20%
indices.memory.index_buffer_size: 30%

non-default settings

There are quite a few settings there that would have the potential to drive up heap usage. How did you arrive at these settings?

  "node_concurrent_recoveries" : "4"   :
            Because index sharding is more, recovery time is longer, speed up recovery

  http.max_content_length: 2000mb      
  search.max_buckets: 200000  
          The query returns a large result, which is adjusted according to the actual error information
  
  
  indices.memory.index_buffer_size: 30%   
 thread_pool.write.queue_size: 3000
           Speed up data loading and avoid the Reject thread

And now the load data dare not open replicas,If I turn on the replicas , cluster on the frequent node out and join

The issue here is that you've gone and increased these settings to solve one problem, and just caused another.

You should consider reducing the size of your cluster and splitting it into 2.

Increasing the queue size does generally not improve indexing throughput, it just stores more data in memory which could very well be what is putting you over the limit. I would recommend lowering the index buffer size to the default value and bringing the queue size down. When you do so you may also need to reduce the indexing concurrency or adjust bulk sizes.

1 Like

OK, I Will Try.
Thanks

Theare a lot of questions like that (shard allocation stuck on circuit breakers). Maybe there are good step-by-step check lists - how to fix this?

Thanks!

[elasticsearch.server][WARN] failing shard [failed shard, shard [uni_v202009_013][22], node[EI9qpLwbRYmLOu5MY6kFnA], [R], s[STARTED], a[id=83LfMw8zRjqqZXlbFUAZdA], message [failed to perform indices:data/write/bulk[s] on replica [uni_v202009_013][22], node[EI9qpLwbRYmLOu5MY6kFnA], [R], s[STARTED], a[id=83LfMw8zRjqqZXlbFUAZdA]], failure [RemoteTransportException[[nod-17.14-2][10.177.17.14:9301][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [30382102546/28.2gb], which is larger than the limit of [30369601945/28.2gb], real usage: [30382085992/28.2gb], new bytes reserved: [16554/16.1kb], usages [request=0/0b, fielddata=3254/3.1kb, in_flight_requests=136952942/130.6mb, accounting=1179363849/1gb]]; ], markAsStale [true]]

Once the cluster was restarted, it succeeded once, and the error will be reported in Bulk later

write.queue_size:1000
index_buffer_zise:15%
indices.recovery.max_bytes_per_sec:40M
es.batch.size.bytes:3M
concurrence :200

[elasticsearch.server][WARN] failing shard [failed shard, shard [busi_v202010][192], node[5lAJiB_6TIevyuvKwHvfWA], [R], s[STARTED], a[id=4887uAdfTV6BhbZF3sbH6g], message [failed to perform indices:data/write/bulk[s] on replica [busi_v202010][192], node[5lAJiB_6TIevyuvKwHvfWA], [R], s[STARTED], a[id=4887uAdfTV6BhbZF3sbH6g]], failure [RemoteTransportException[[nod-16.207-2][10.177.16.207:9301][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [30668109502/28.5gb], which is larger than the limit of [30369601945/28.2gb], real usage: [30666952568/28.5gb], new bytes reserved: [1156934/1.1mb], usages [request=0/0b, fielddata=32679/31.9kb, in_flight_requests=1586882/1.5mb, accounting=706596702/673.8mb]]; ], markAsStale [true]]

@Christian_Dahlqvist
index_buffer_zise:10%
indices.recovery.max_bytes_per_sec:40M

The problem is still

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.