Data too large, data for [<transport_request>] would be [10554893106/9.8gb], which is larger than the limit of [10092838912/9.3gb], real usage: [10523239224/9.8gb], new bytes reserved: [31653882/30.1mb]]]]

Rakesh4 · July 25, 2019, 10:01am

Hi,
We have elastic search configuration :
version : 7.1.1
3 master node
3 data node
2 coordinating node

memory configuration
Data node : 64 machine
java heap allocation : 10 gb

coord : 32 gb
java heap aloc : 15 gb

Total index document : 404639822
Total no of indices : 200

While indexing it is giving above error.
Any suggestions?

Regards,
Rakesh

LoadingZhang · July 25, 2019, 10:07am

    [2019-07-28T09:30:09,326][WARN ][o.e.c.r.a.AllocationService] [master01.bj] failing shard [failed shard, shard [index-2019.07.28][31], node[dULld6ucSuWg
    _Xk1Y4-CkA], relocating [u7cQvUAzS_eClpZewpuvKQ], [P], recovery_source[peer recovery], s[INITIALIZING], a[id=dnUG7BdORaW6OvvV7Dddmw, rId=-cl71TKHRziD2Ilx6OlgiA], expected_shard_size[3944
    472890], message [failed recovery], failure [RecoveryFailedException[[index-2019.07.28][31]: Recovery failed from {node05.bj/9201}{u7cQvUAzS_eClpZe
    wpuvKQ}{DeeGD_GsTXCE9ZFfF2ES2g}{node05.bj}{10.152.157.29:9311}{server=node05.bj, ml.machine_memory=135083769856, ml.max_open_jobs=20, node_type=wa
    rm, xpack.installed=true, type=cold} into {node02.bj/9200}{dULld6ucSuWg_Xk1Y4-CkA}{UBz5HuvlRJmy3fQdHt48Tg}{node02.bj}{10.152.158.29:9310}{server=node02.bj, ml.machine_memory=135083769856, node_type=warm, xpack.installed=true, ml.max_open_jobs=20, type=cold}]; nested: RemoteTransportException[[node05.bj/9201][10.152.157.29:9311][internal:index/shard/recovery/start_recovery]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [30768737168/28.6gb], which is larger than the limit of [30601641984/28.5gb], real usage: [30767656448/28.6gb], new bytes reserved: [1080720/1mb]]; ], markAsStale [true]]
    ...
    Caused by: org.elasticsearch.transport.RemoteTransportException: [node05.bj/9201][10.152.157.29:9311][internal:index/shard/recovery/start_recovery]
    Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [30768737168/28.6gb], which is larger than the limi
    t of [30601641984/28.5gb], real usage: [30767656448/28.6gb], new bytes reserved: [1080720/1mb]
        at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:343) ~[elasticsearch-7.2.0.jar:7.2.0]
        at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:128) ~[elasticsearch-7.2.0.jar:7.2.0]
        at org.elasticsearch.transport.InboundHandler.handleRequest(InboundHandler.java:173) ~[elasticsearch-7.2.0.jar:7.2.0]
        at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:121) ~[elasticsearch-7.2.0.jar:7.2.0]
        at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:105) ~[elasticsearch-7.2.0.jar:7.2.0]
        at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:660) ~[elasticsearch-7.2.0.jar:7.2.0]
        at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:62) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) ~[?:?]
        at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323) ~[?:?]
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) ~[?:?]
        at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) ~[?:?]
        at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1478) ~[?:?]
        at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1227) ~[?:?]
        at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1274) ~[?:?]
        at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502) ~[?:?]
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:441) ~[?:?]
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) ~[?:?]
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1408) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) ~[?:?]
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:930) ~[?:?]
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:682) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:582) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:536) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) ~[?:?]
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906) ~[?:?]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
        ... 1 more

ywelsch · July 25, 2019, 11:31am

Are you using G1GC? Can you share your node stats?

Rakesh4 · July 25, 2019, 11:46am

Thansk for response ,
We are using default GC (ConcMarkSweepGC) .
Regarding node stats , do you want specific node stats since it quite large file ?

ywelsch · July 25, 2019, 12:57pm

Just sharing the full stats would be useful. This will give us an impression on how much load you have on the nodes (e.g. circuit breaker or thread pool stats). Are you getting similar breaker exceptions when the real-memory circuit breaker is disabled (i.e. indices.breaker.total.use_real_memory = false, see https://www.elastic.co/guide/en/elasticsearch/reference/current/circuit-breaker.html)?
Can you describe the kind of indexing / search load that you have on the cluster? Note that breaker exceptions are not necessarily bad, as they prevent the nodes from running out of memory. Clients are supposed to retry requests on these exceptions.

Rakesh4 · July 25, 2019, 5:50pm

Hi,
sending you in two part,
1.

nIarZtTaSvKeOQu9kK8uzg: {

		indices: {
			docs: {
				count: 758027349,
				deleted: 15498
			},
			store: {
				size_in_bytes: 2116493376066
			},
			indexing: {
				index_total: 2163556,
				index_time_in_millis: 47955592,
				index_current: 4,
				index_failed: 0,
				delete_total: 0,
				delete_time_in_millis: 0,
				delete_current: 0,
				noop_update_total: 0,
				is_throttled: false,
				throttle_time_in_millis: 0
			},
			get: {
				total: 13,
				time_in_millis: 3,
				exists_total: 9,
				exists_time_in_millis: 2,
				missing_total: 4,
				missing_time_in_millis: 1,
				current: 0
			},
			search: {
				open_contexts: 0,
				query_total: 1586,
				query_time_in_millis: 46107,
				query_current: 0,
				fetch_total: 329,
				fetch_time_in_millis: 758,
				fetch_current: 0,
				scroll_total: 0,
				scroll_time_in_millis: 0,
				scroll_current: 0,
				suggest_total: 0,
				suggest_time_in_millis: 0,
				suggest_current: 0
			},
			merges: {
				current: 0,
				current_docs: 0,
				current_size_in_bytes: 0,
				total: 197,
				total_time_in_millis: 5971429,
				total_docs: 9215790,
				total_size_in_bytes: 24454902051,
				total_stopped_time_in_millis: 33087,
				total_throttled_time_in_millis: 1739261,
				total_auto_throttle_in_bytes: 4199305982
			},
			refresh: {
				total: 653,
				total_time_in_millis: 310733,
				listeners: 0
			},
			flush: {
				total: 250,
				periodic: 69,
				total_time_in_millis: 663012
			},
			warmer: {
				current: 0,
				total: 211,
				total_time_in_millis: 5543
			},
			query_cache: {
				memory_size_in_bytes: 0,
				total_count: 0,
				hit_count: 0,
				miss_count: 0,
				cache_size: 0,
				cache_count: 0,
				evictions: 0
			},
			fielddata: {
				memory_size_in_bytes: 0,
				evictions: 0
			},
			completion: {
				size_in_bytes: 0
			},
			segments: {
				count: 3526,
				memory_in_bytes: 7462701818,
				terms_memory_in_bytes: 6586791550,
				stored_fields_memory_in_bytes: 844839304,
				term_vectors_memory_in_bytes: 0,
				norms_memory_in_bytes: 5231104,
				points_memory_in_bytes: 24656302,
				doc_values_memory_in_bytes: 1183558,
				index_writer_memory_in_bytes: 0,
				version_map_memory_in_bytes: 0,
				fixed_bit_set_memory_in_bytes: 102489232,
				max_unsafe_auto_id_timestamp: -1,
				file_sizes: {}
			},
			translog: {
				operations: 298452,
				size_in_bytes: 3603369718,
				uncommitted_operations: 3400,
				uncommitted_size_in_bytes: 35667673,
				earliest_last_modified_age: 0
			},
			request_cache: {
				memory_size_in_bytes: 118505,
				evictions: 0,
				hit_count: 60,
				miss_count: 157
			},
			recovery: {
				current_as_source: 0,
				current_as_target: 0,
				throttle_time_in_millis: 226792
			}
		},

Rakesh4 · July 25, 2019, 6:06pm

os: {
timestamp: 1564054673125,
cpu: {
percent: 2,
load_average: {
1 m: 0.77,
5 m: 0.58,
15 m: 0.47
}
},
mem: {
total_in_bytes: 66016198656,
free_in_bytes: 2365132800,
used_in_bytes: 63651065856,
free_percent: 4,
used_percent: 96
},
swap: {
total_in_bytes: 0,
free_in_bytes: 0,
used_in_bytes: 0
},
cgroup: {
cpuacct: {
control_group: "/system.slice/elasticsearch.service",
usage_nanos: 15509087613496
},
cpu: {
control_group: "/system.slice/elasticsearch.service",
cfs_period_micros: 100000,
cfs_quota_micros: -1,
stat: {
number_of_elapsed_periods: 0,
number_of_times_throttled: 0,
time_throttled_nanos: 0
}
},
memory: {
control_group: "/system.slice/elasticsearch.service",
limit_in_bytes: "9223372036854771712",
usage_in_bytes: "62100361216"
}
}
},
process: {
timestamp: 1564054673125,
open_file_descriptors: 10731,
max_file_descriptors: 65535,
cpu: {
percent: 2,
total_in_millis: 15508520
},
mem: {
total_virtual_in_bytes: 883942322176
}
},
jvm: {
timestamp: 1564054673130,
uptime_in_millis: 8608691,
mem: {
heap_used_in_bytes: 9885317584,
heap_used_percent: 93,
heap_committed_in_bytes: 10624040960,
heap_max_in_bytes: 10624040960,
non_heap_used_in_bytes: 156533784,
non_heap_committed_in_bytes: 169385984,
pools: {
young: {
used_in_bytes: 701222496,
max_in_bytes: 907345920,
peak_used_in_bytes: 907345920,
peak_max_in_bytes: 907345920
},
survivor: {
used_in_bytes: 71339224,
max_in_bytes: 113377280,
peak_used_in_bytes: 113377280,
peak_max_in_bytes: 113377280
},
old: {
used_in_bytes: 9112755864,
max_in_bytes: 9603317760,
peak_used_in_bytes: 9603317760,
peak_max_in_bytes: 9603317760
}
}
},
threads: {
count: 209,
peak_count: 220
},
gc: {
collectors: {
young: {
collection_count: 1942,
collection_time_in_millis: 24093
},
old: {
collection_count: 2079,
collection_time_in_millis: 2789580
}
}
},
buffer_pools: {
mapped: {
count: 5190,
used_in_bytes: 862020801480,
total_capacity_in_bytes: 862020801480
},
direct: {
count: 213,
used_in_bytes: 540871074,
total_capacity_in_bytes: 540871073
}
},
classes: {
current_loaded_count: 16926,
total_loaded_count: 17232,
total_unloaded_count: 306
}
},
thread_pool: {
analyze: {
threads: 0,
queue: 0,
active: 0,
rejected: 0,
largest: 0,
completed: 0
},
ccr: {
threads: 0,
queue: 0,
active: 0,
rejected: 0,
largest: 0,
completed: 0
},
fetch_shard_started: {
threads: 1,
queue: 0,
active: 0,
rejected: 0,
largest: 32,
completed: 194
},
fetch_shard_store: {
threads: 1,
queue: 0,
active: 0,
rejected: 0,
largest: 32,
completed: 369
},
flush: {
threads: 1,
queue: 0,
active: 0,
rejected: 0,
largest: 5,
completed: 682
},
force_merge: {
threads: 0,
queue: 0,
active: 0,
rejected: 0,
largest: 0,
completed: 0
},
generic: {
threads: 91,
queue: 0,
active: 0,
rejected: 0,
largest: 107,
completed: 69565
},
get: {
threads: 13,
queue: 0,
active: 0,
rejected: 0,
largest: 13,
completed: 13
},
listener: {
threads: 1,
queue: 0,
active: 0,
rejected: 0,
largest: 1,
completed: 1
},
management: {
threads: 5,
queue: 0,
active: 1,
rejected: 0,
largest: 5,
completed: 103006
},
refresh: {
threads: 8,
queue: 0,
active: 0,
rejected: 0,
largest: 8,
completed: 57178
},
rollup_indexing: {
threads: 0,
queue: 0,
active: 0,
rejected: 0,
largest: 0,
completed: 0
},
search: {
threads: 25,
queue: 0,
active: 0,
rejected: 0,
largest: 25,
completed: 1596
},
search_throttled: {
threads: 0,
queue: 0,
active: 0,
rejected: 0,
largest: 0,
completed: 0
},
security - token - key: {
threads: 0,
queue: 0,
active: 0,
rejected: 0,
largest: 0,
completed: 0
},
snapshot: {
threads: 0,
queue: 0,
active: 0,
rejected: 0,
largest: 0,
completed: 0
},
warmer: {
threads: 1,
queue: 0,
active: 0,
rejected: 0,
largest: 5,
completed: 65510
},
watcher: {
threads: 0,
queue: 0,
active: 0,
rejected: 0,
largest: 0,
completed: 0
},
write: {
threads: 16,
queue: 0,
active: 0,
rejected: 0,
largest: 16,
completed: 3423
}
},
fs: {
timestamp: 1564054673130,
total: {
total_in_bytes: 4908285886464,
free_in_bytes: 2781784522752,
available_in_bytes: 2534397763584
},
data: [{
path: "/usr/share/elasticsearch/data/nodes/0",
mount: "/usr/share/elasticsearch/data (/dev/md0)",
type: "ext4",
total_in_bytes: 4908285886464,
free_in_bytes: 2781784522752,
available_in_bytes: 2534397763584
}],
io_stats: {
devices: [{
device_name: "md0",
operations: 802973,
read_operations: 249694,
write_operations: 553279,
read_kilobytes: 49917524,
write_kilobytes: 78680540
}],
total: {
operations: 802973,
read_operations: 249694,
write_operations: 553279,
read_kilobytes: 49917524,
write_kilobytes: 78680540
}
}
},
transport: {
server_open: 77,
rx_count: 103795,
rx_size_in_bytes: 59328301118,
tx_count: 103822,
tx_size_in_bytes: 11407673247
}

Rakesh4 · July 25, 2019, 6:06pm

,
http: {
current_open: 0,
total_opened: 0
},
breakers: {
request: {
limit_size_in_bytes: 6374424576,
limit_size: "5.9gb",
estimated_size_in_bytes: 0,
estimated_size: "0b",
overhead: 1,
tripped: 0
},
fielddata: {
limit_size_in_bytes: 4249616384,
limit_size: "3.9gb",
estimated_size_in_bytes: 0,
estimated_size: "0b",
overhead: 1.03,
tripped: 0
},
in_flight_requests: {
limit_size_in_bytes: 10624040960,
limit_size: "9.8gb",
estimated_size_in_bytes: 1358,
estimated_size: "1.3kb",
overhead: 2,
tripped: 0
},
accounting: {
limit_size_in_bytes: 10624040960,
limit_size: "9.8gb",
estimated_size_in_bytes: 7462701306,
estimated_size: "6.9gb",
overhead: 1,
tripped: 0
},
parent: {
limit_size_in_bytes: 10092838912,
limit_size: "9.3gb",
estimated_size_in_bytes: 9885457888,
estimated_size: "9.2gb",
overhead: 1,
tripped: 7194
}
},
script: {
compilations: 0,
cache_evictions: 0,
compilation_limit_triggered: 0
},
discovery: {
cluster_state_queue: {
total: 0,
pending: 0,
committed: 0
},
published_cluster_states: {
full_states: 1,
incompatible_diffs: 0,
compatible_diffs: 329
}
},
ingest: {
total: {
count: 0,
time_in_millis: 0,
current: 0,
failed: 0
},
pipelines: {}
},
adaptive_selection: {}
}

LoadingZhang · July 28, 2019, 1:45am

node stats: https://del.dog/omihuvavuc
Maybe referenced this issue

Rakesh4 · July 29, 2019, 9:05am

Hi,

Few queries regarding replica,
I have 1 data node and 2 replica node

While indexing , in back ground replica get started , then where ES know the bulk size to get from primary shards to replica shard ?
ES that we have , have open JDK does it impact in GC ?
what is difference in , before indexing making replica to -1 and then 2 vs in starting replica 2 and indexing ?

ywelsch · July 29, 2019, 9:17am

@Rakesh4 It looks like you're running to the limits what can be stored on these nodes. From the node stats I see

heap_used_percent: 93

store: {
   size_in_bytes: 2116493376066
},

accounting: {
limit_size_in_bytes: 10624040960,
limit_size: "9.8gb",
estimated_size_in_bytes: 7462701306,
estimated_size: "6.9gb",
overhead: 1,
tripped: 0
},

which indicates that you're storing 2TB on this node, which has a 10GB heap. 6.9GB (see accounting breaker) is just to keep in-memory shard structures.

You should increase heap size or move towards using the frozen indices feature.

ywelsch · July 29, 2019, 9:25am

@LoadingZhang your problem seems to be a different one. Can you open a fresh issue? In particular it looks like the heap memory consumed by your node is nowhere near the limit.

LoadingZhang · July 30, 2019, 1:45am

of course.

Rakesh4 · July 30, 2019, 12:32pm

Hi,
Thanks for the reply
Few doubts here.
I have clean new environment with following config.
3 master node (15 GB heap, 2 TB hdd , machine 16GB)
3 data node(10 GB heap, 2TB hdd , machine 64 GB)
2 coord node(15GB heap, 2TB hdd, machine 32GB)

Currently there are 5 indices,
when I am indexing its not running out of limit, its consuming on 5 GB out of 10 GB.
but in previous case(which I shared node stats) was running out of limit.

Questions.

How replica creation deciding chunk size while indexing , that is in memory.
When I change replica 0 and again replica 2 then it properly creates replica without running out of memory.

Let me explain you about my process.

with above configuration .
my indexing process is going on with bulk size of 15 mb and in setting replica set to 2 .
then I was getting circuit exception.

pokaleshrey · August 5, 2019, 8:00am

Can you please explain what these in-memory shard structures are ?
Does it have any relation with the mappings that i have put in each index. For now all my indexes have same mapping.
I can also see that if i use same mapping and go on creating empty indices the heap occupied goes on increasing and is never free'd. Note that i am not putting any documents while doing #2

ywelsch · August 5, 2019, 11:48am

Can you please explain what these in-memory shard structures are ?

While Lucene accesses a lot of data in off-heap fashion, certain helper structures are loaded on-heap to provide fast access. Memory usage of these structures is captured by the accounting breaker.

Does it have any relation with the mappings that i have put in each index. For now all my indexes have same mapping.

No, the mappings determine how documents are indexed into Lucene, but do not create an overhead on the Lucene side. Mappings are managed by Elasticsearch. Each mapping comes with a certain overhead on heap as the Elasticsearch node that holds a shard for the given index will create in-memory structures to efficiently parse and validate documents under the specified mapping.

I can also see that if i use same mapping and go on creating empty indices the heap occupied goes on increasing and is never free'd. Note that i am not putting any documents while doing #2

These in-memory mapping structures are created on a per-index basis, i.e. are not shared between indices even if they have the same mappings.

pokaleshrey · August 5, 2019, 11:54am

Thanks for the detailed explanation Yannick. My memory increasing comment was based on a very small number of index creation. Kindly neglect my fault.
In a more extensive test i have created 500 empty indexes with same mapping setting being used in above mail trail. My cluster state was happily GREEN and ES worked very well by GCing the shooting memory.

system · September 2, 2019, 11:54am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
CircuitBreakingException: [parent] Data too large, data for [<transport_request>] Elasticsearch	7	23702	September 5, 2018
Data too large, data for [<transport_request>] Elasticsearch es-hadoop	17	2242	January 25, 2021
Data too large, data for [<transport_request>] Elasticsearch docker , es-hadoop	1	547	August 14, 2020
Elastic nodes large heap usage - "Data too large, data for [<http_request>] would be..." Elasticsearch	5	2450	September 18, 2019
Data too large, data for [<transport_request>] Elasticsearch	1	693	March 30, 2020

Data too large, data for [<transport_request>] would be [10554893106/9.8gb], which is larger than the limit of [10092838912/9.3gb], real usage: [10523239224/9.8gb], new bytes reserved: [31653882/30.1mb]]]]

Related topics