Data too large, data for [<transport_request>] would be [10554893106/9.8gb], which is larger than the limit of [10092838912/9.3gb], real usage: [10523239224/9.8gb], new bytes reserved: [31653882/30.1mb]]]]

Hi,
We have elastic search configuration :
version : 7.1.1
3 master node
3 data node
2 coordinating node

memory configuration
Data node : 64 machine
java heap allocation : 10 gb

coord : 32 gb
java heap aloc : 15 gb

Total index document : 404639822
Total no of indices : 200

While indexing it is giving above error.
Any suggestions?

Regards,
Rakesh

    [2019-07-28T09:30:09,326][WARN ][o.e.c.r.a.AllocationService] [master01.bj] failing shard [failed shard, shard [index-2019.07.28][31], node[dULld6ucSuWg
    _Xk1Y4-CkA], relocating [u7cQvUAzS_eClpZewpuvKQ], [P], recovery_source[peer recovery], s[INITIALIZING], a[id=dnUG7BdORaW6OvvV7Dddmw, rId=-cl71TKHRziD2Ilx6OlgiA], expected_shard_size[3944
    472890], message [failed recovery], failure [RecoveryFailedException[[index-2019.07.28][31]: Recovery failed from {node05.bj/9201}{u7cQvUAzS_eClpZe
    wpuvKQ}{DeeGD_GsTXCE9ZFfF2ES2g}{node05.bj}{10.152.157.29:9311}{server=node05.bj, ml.machine_memory=135083769856, ml.max_open_jobs=20, node_type=wa
    rm, xpack.installed=true, type=cold} into {node02.bj/9200}{dULld6ucSuWg_Xk1Y4-CkA}{UBz5HuvlRJmy3fQdHt48Tg}{node02.bj}{10.152.158.29:9310}{server=node02.bj, ml.machine_memory=135083769856, node_type=warm, xpack.installed=true, ml.max_open_jobs=20, type=cold}]; nested: RemoteTransportException[[node05.bj/9201][10.152.157.29:9311][internal:index/shard/recovery/start_recovery]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [30768737168/28.6gb], which is larger than the limit of [30601641984/28.5gb], real usage: [30767656448/28.6gb], new bytes reserved: [1080720/1mb]]; ], markAsStale [true]]
    ...
    Caused by: org.elasticsearch.transport.RemoteTransportException: [node05.bj/9201][10.152.157.29:9311][internal:index/shard/recovery/start_recovery]
    Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [30768737168/28.6gb], which is larger than the limi
    t of [30601641984/28.5gb], real usage: [30767656448/28.6gb], new bytes reserved: [1080720/1mb]
        at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:343) ~[elasticsearch-7.2.0.jar:7.2.0]
        at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:128) ~[elasticsearch-7.2.0.jar:7.2.0]
        at org.elasticsearch.transport.InboundHandler.handleRequest(InboundHandler.java:173) ~[elasticsearch-7.2.0.jar:7.2.0]
        at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:121) ~[elasticsearch-7.2.0.jar:7.2.0]
        at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:105) ~[elasticsearch-7.2.0.jar:7.2.0]
        at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:660) ~[elasticsearch-7.2.0.jar:7.2.0]
        at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:62) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) ~[?:?]
        at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323) ~[?:?]
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) ~[?:?]
        at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) ~[?:?]
        at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1478) ~[?:?]
        at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1227) ~[?:?]
        at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1274) ~[?:?]
        at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502) ~[?:?]
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:441) ~[?:?]
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) ~[?:?]
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1408) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) ~[?:?]
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:930) ~[?:?]
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:682) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:582) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:536) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) ~[?:?]
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906) ~[?:?]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
        ... 1 more

Are you using G1GC? Can you share your node stats?

Thansk for response ,
We are using default GC (ConcMarkSweepGC) .
Regarding node stats , do you want specific node stats since it quite large file ?

Just sharing the full stats would be useful. This will give us an impression on how much load you have on the nodes (e.g. circuit breaker or thread pool stats). Are you getting similar breaker exceptions when the real-memory circuit breaker is disabled (i.e. indices.breaker.total.use_real_memory = false, see https://www.elastic.co/guide/en/elasticsearch/reference/current/circuit-breaker.html)?
Can you describe the kind of indexing / search load that you have on the cluster? Note that breaker exceptions are not necessarily bad, as they prevent the nodes from running out of memory. Clients are supposed to retry requests on these exceptions.

Hi,
sending you in two part,
1.

nIarZtTaSvKeOQu9kK8uzg: {

		indices: {
			docs: {
				count: 758027349,
				deleted: 15498
			},
			store: {
				size_in_bytes: 2116493376066
			},
			indexing: {
				index_total: 2163556,
				index_time_in_millis: 47955592,
				index_current: 4,
				index_failed: 0,
				delete_total: 0,
				delete_time_in_millis: 0,
				delete_current: 0,
				noop_update_total: 0,
				is_throttled: false,
				throttle_time_in_millis: 0
			},
			get: {
				total: 13,
				time_in_millis: 3,
				exists_total: 9,
				exists_time_in_millis: 2,
				missing_total: 4,
				missing_time_in_millis: 1,
				current: 0
			},
			search: {
				open_contexts: 0,
				query_total: 1586,
				query_time_in_millis: 46107,
				query_current: 0,
				fetch_total: 329,
				fetch_time_in_millis: 758,
				fetch_current: 0,
				scroll_total: 0,
				scroll_time_in_millis: 0,
				scroll_current: 0,
				suggest_total: 0,
				suggest_time_in_millis: 0,
				suggest_current: 0
			},
			merges: {
				current: 0,
				current_docs: 0,
				current_size_in_bytes: 0,
				total: 197,
				total_time_in_millis: 5971429,
				total_docs: 9215790,
				total_size_in_bytes: 24454902051,
				total_stopped_time_in_millis: 33087,
				total_throttled_time_in_millis: 1739261,
				total_auto_throttle_in_bytes: 4199305982
			},
			refresh: {
				total: 653,
				total_time_in_millis: 310733,
				listeners: 0
			},
			flush: {
				total: 250,
				periodic: 69,
				total_time_in_millis: 663012
			},
			warmer: {
				current: 0,
				total: 211,
				total_time_in_millis: 5543
			},
			query_cache: {
				memory_size_in_bytes: 0,
				total_count: 0,
				hit_count: 0,
				miss_count: 0,
				cache_size: 0,
				cache_count: 0,
				evictions: 0
			},
			fielddata: {
				memory_size_in_bytes: 0,
				evictions: 0
			},
			completion: {
				size_in_bytes: 0
			},
			segments: {
				count: 3526,
				memory_in_bytes: 7462701818,
				terms_memory_in_bytes: 6586791550,
				stored_fields_memory_in_bytes: 844839304,
				term_vectors_memory_in_bytes: 0,
				norms_memory_in_bytes: 5231104,
				points_memory_in_bytes: 24656302,
				doc_values_memory_in_bytes: 1183558,
				index_writer_memory_in_bytes: 0,
				version_map_memory_in_bytes: 0,
				fixed_bit_set_memory_in_bytes: 102489232,
				max_unsafe_auto_id_timestamp: -1,
				file_sizes: {}
			},
			translog: {
				operations: 298452,
				size_in_bytes: 3603369718,
				uncommitted_operations: 3400,
				uncommitted_size_in_bytes: 35667673,
				earliest_last_modified_age: 0
			},
			request_cache: {
				memory_size_in_bytes: 118505,
				evictions: 0,
				hit_count: 60,
				miss_count: 157
			},
			recovery: {
				current_as_source: 0,
				current_as_target: 0,
				throttle_time_in_millis: 226792
			}
		},

os: {
timestamp: 1564054673125,
cpu: {
percent: 2,
load_average: {
1 m: 0.77,
5 m: 0.58,
15 m: 0.47
}
},
mem: {
total_in_bytes: 66016198656,
free_in_bytes: 2365132800,
used_in_bytes: 63651065856,
free_percent: 4,
used_percent: 96
},
swap: {
total_in_bytes: 0,
free_in_bytes: 0,
used_in_bytes: 0
},
cgroup: {
cpuacct: {
control_group: "/system.slice/elasticsearch.service",
usage_nanos: 15509087613496
},
cpu: {
control_group: "/system.slice/elasticsearch.service",
cfs_period_micros: 100000,
cfs_quota_micros: -1,
stat: {
number_of_elapsed_periods: 0,
number_of_times_throttled: 0,
time_throttled_nanos: 0
}
},
memory: {
control_group: "/system.slice/elasticsearch.service",
limit_in_bytes: "9223372036854771712",
usage_in_bytes: "62100361216"
}
}
},
process: {
timestamp: 1564054673125,
open_file_descriptors: 10731,
max_file_descriptors: 65535,
cpu: {
percent: 2,
total_in_millis: 15508520
},
mem: {
total_virtual_in_bytes: 883942322176
}
},
jvm: {
timestamp: 1564054673130,
uptime_in_millis: 8608691,
mem: {
heap_used_in_bytes: 9885317584,
heap_used_percent: 93,
heap_committed_in_bytes: 10624040960,
heap_max_in_bytes: 10624040960,
non_heap_used_in_bytes: 156533784,
non_heap_committed_in_bytes: 169385984,
pools: {
young: {
used_in_bytes: 701222496,
max_in_bytes: 907345920,
peak_used_in_bytes: 907345920,
peak_max_in_bytes: 907345920
},
survivor: {
used_in_bytes: 71339224,
max_in_bytes: 113377280,
peak_used_in_bytes: 113377280,
peak_max_in_bytes: 113377280
},
old: {
used_in_bytes: 9112755864,
max_in_bytes: 9603317760,
peak_used_in_bytes: 9603317760,
peak_max_in_bytes: 9603317760
}
}
},
threads: {
count: 209,
peak_count: 220
},
gc: {
collectors: {
young: {
collection_count: 1942,
collection_time_in_millis: 24093
},
old: {
collection_count: 2079,
collection_time_in_millis: 2789580
}
}
},
buffer_pools: {
mapped: {
count: 5190,
used_in_bytes: 862020801480,
total_capacity_in_bytes: 862020801480
},
direct: {
count: 213,
used_in_bytes: 540871074,
total_capacity_in_bytes: 540871073
}
},
classes: {
current_loaded_count: 16926,
total_loaded_count: 17232,
total_unloaded_count: 306
}
},
thread_pool: {
analyze: {
threads: 0,
queue: 0,
active: 0,
rejected: 0,
largest: 0,
completed: 0
},
ccr: {
threads: 0,
queue: 0,
active: 0,
rejected: 0,
largest: 0,
completed: 0
},
fetch_shard_started: {
threads: 1,
queue: 0,
active: 0,
rejected: 0,
largest: 32,
completed: 194
},
fetch_shard_store: {
threads: 1,
queue: 0,
active: 0,
rejected: 0,
largest: 32,
completed: 369
},
flush: {
threads: 1,
queue: 0,
active: 0,
rejected: 0,
largest: 5,
completed: 682
},
force_merge: {
threads: 0,
queue: 0,
active: 0,
rejected: 0,
largest: 0,
completed: 0
},
generic: {
threads: 91,
queue: 0,
active: 0,
rejected: 0,
largest: 107,
completed: 69565
},
get: {
threads: 13,
queue: 0,
active: 0,
rejected: 0,
largest: 13,
completed: 13
},
listener: {
threads: 1,
queue: 0,
active: 0,
rejected: 0,
largest: 1,
completed: 1
},
management: {
threads: 5,
queue: 0,
active: 1,
rejected: 0,
largest: 5,
completed: 103006
},
refresh: {
threads: 8,
queue: 0,
active: 0,
rejected: 0,
largest: 8,
completed: 57178
},
rollup_indexing: {
threads: 0,
queue: 0,
active: 0,
rejected: 0,
largest: 0,
completed: 0
},
search: {
threads: 25,
queue: 0,
active: 0,
rejected: 0,
largest: 25,
completed: 1596
},
search_throttled: {
threads: 0,
queue: 0,
active: 0,
rejected: 0,
largest: 0,
completed: 0
},
security - token - key: {
threads: 0,
queue: 0,
active: 0,
rejected: 0,
largest: 0,
completed: 0
},
snapshot: {
threads: 0,
queue: 0,
active: 0,
rejected: 0,
largest: 0,
completed: 0
},
warmer: {
threads: 1,
queue: 0,
active: 0,
rejected: 0,
largest: 5,
completed: 65510
},
watcher: {
threads: 0,
queue: 0,
active: 0,
rejected: 0,
largest: 0,
completed: 0
},
write: {
threads: 16,
queue: 0,
active: 0,
rejected: 0,
largest: 16,
completed: 3423
}
},
fs: {
timestamp: 1564054673130,
total: {
total_in_bytes: 4908285886464,
free_in_bytes: 2781784522752,
available_in_bytes: 2534397763584
},
data: [{
path: "/usr/share/elasticsearch/data/nodes/0",
mount: "/usr/share/elasticsearch/data (/dev/md0)",
type: "ext4",
total_in_bytes: 4908285886464,
free_in_bytes: 2781784522752,
available_in_bytes: 2534397763584
}],
io_stats: {
devices: [{
device_name: "md0",
operations: 802973,
read_operations: 249694,
write_operations: 553279,
read_kilobytes: 49917524,
write_kilobytes: 78680540
}],
total: {
operations: 802973,
read_operations: 249694,
write_operations: 553279,
read_kilobytes: 49917524,
write_kilobytes: 78680540
}
}
},
transport: {
server_open: 77,
rx_count: 103795,
rx_size_in_bytes: 59328301118,
tx_count: 103822,
tx_size_in_bytes: 11407673247
}

,
http: {
current_open: 0,
total_opened: 0
},
breakers: {
request: {
limit_size_in_bytes: 6374424576,
limit_size: "5.9gb",
estimated_size_in_bytes: 0,
estimated_size: "0b",
overhead: 1,
tripped: 0
},
fielddata: {
limit_size_in_bytes: 4249616384,
limit_size: "3.9gb",
estimated_size_in_bytes: 0,
estimated_size: "0b",
overhead: 1.03,
tripped: 0
},
in_flight_requests: {
limit_size_in_bytes: 10624040960,
limit_size: "9.8gb",
estimated_size_in_bytes: 1358,
estimated_size: "1.3kb",
overhead: 2,
tripped: 0
},
accounting: {
limit_size_in_bytes: 10624040960,
limit_size: "9.8gb",
estimated_size_in_bytes: 7462701306,
estimated_size: "6.9gb",
overhead: 1,
tripped: 0
},
parent: {
limit_size_in_bytes: 10092838912,
limit_size: "9.3gb",
estimated_size_in_bytes: 9885457888,
estimated_size: "9.2gb",
overhead: 1,
tripped: 7194
}
},
script: {
compilations: 0,
cache_evictions: 0,
compilation_limit_triggered: 0
},
discovery: {
cluster_state_queue: {
total: 0,
pending: 0,
committed: 0
},
published_cluster_states: {
full_states: 1,
incompatible_diffs: 0,
compatible_diffs: 329
}
},
ingest: {
total: {
count: 0,
time_in_millis: 0,
current: 0,
failed: 0
},
pipelines: {}
},
adaptive_selection: {}
}

node stats: https://del.dog/omihuvavuc
Maybe referenced this issue

Hi,

Few queries regarding replica,
I have 1 data node and 2 replica node

  1. While indexing , in back ground replica get started , then where ES know the bulk size to get from primary shards to replica shard ?
  2. ES that we have , have open JDK does it impact in GC ?
  3. what is difference in , before indexing making replica to -1 and then 2 vs in starting replica 2 and indexing ?

@Rakesh4 It looks like you're running to the limits what can be stored on these nodes. From the node stats I see

heap_used_percent: 93

store: {
   size_in_bytes: 2116493376066
},

accounting: {
limit_size_in_bytes: 10624040960,
limit_size: "9.8gb",
estimated_size_in_bytes: 7462701306,
estimated_size: "6.9gb",
overhead: 1,
tripped: 0
},

which indicates that you're storing 2TB on this node, which has a 10GB heap. 6.9GB (see accounting breaker) is just to keep in-memory shard structures.

You should increase heap size or move towards using the frozen indices feature.

@LoadingZhang your problem seems to be a different one. Can you open a fresh issue? In particular it looks like the heap memory consumed by your node is nowhere near the limit.

of course.

Hi,
Thanks for the reply
Few doubts here.
I have clean new environment with following config.
3 master node (15 GB heap, 2 TB hdd , machine 16GB)
3 data node(10 GB heap, 2TB hdd , machine 64 GB)
2 coord node(15GB heap, 2TB hdd, machine 32GB)

Currently there are 5 indices,
when I am indexing its not running out of limit, its consuming on 5 GB out of 10 GB.
but in previous case(which I shared node stats) was running out of limit.

Questions.

  1. How replica creation deciding chunk size while indexing , that is in memory.
  2. When I change replica 0 and again replica 2 then it properly creates replica without running out of memory.

Let me explain you about my process.

with above configuration .
my indexing process is going on with bulk size of 15 mb and in setting replica set to 2 .
then I was getting circuit exception.

  1. Can you please explain what these in-memory shard structures are ?
    Does it have any relation with the mappings that i have put in each index. For now all my indexes have same mapping.
  2. I can also see that if i use same mapping and go on creating empty indices the heap occupied goes on increasing and is never free'd. Note that i am not putting any documents while doing #2

Can you please explain what these in-memory shard structures are ?

While Lucene accesses a lot of data in off-heap fashion, certain helper structures are loaded on-heap to provide fast access. Memory usage of these structures is captured by the accounting breaker.

Does it have any relation with the mappings that i have put in each index. For now all my indexes have same mapping.

No, the mappings determine how documents are indexed into Lucene, but do not create an overhead on the Lucene side. Mappings are managed by Elasticsearch. Each mapping comes with a certain overhead on heap as the Elasticsearch node that holds a shard for the given index will create in-memory structures to efficiently parse and validate documents under the specified mapping.

I can also see that if i use same mapping and go on creating empty indices the heap occupied goes on increasing and is never free'd. Note that i am not putting any documents while doing #2

These in-memory mapping structures are created on a per-index basis, i.e. are not shared between indices even if they have the same mappings.

Thanks for the detailed explanation Yannick. My memory increasing comment was based on a very small number of index creation. Kindly neglect my fault.
In a more extensive test i have created 500 empty indexes with same mapping setting being used in above mail trail. My cluster state was happily GREEN and ES worked very well by GCing the shooting memory.