I have an ES stack running on AWS spot instances, so shard reallocation happens quite frequently.
Occasionally I receive the following error message and then the shard remains unassigned without further allocation attempts:
nested: RemoteTransportException[[ip-172-30-2-197.ec2.internal][][internal:index/shard/recovery/filesInfo]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [32428797638/30.2gb], which is larger than the limit of [31621696716/29.4gb], real usage: [32428793232/30.2gb], new bytes reserved: [4406/4.3kb], usages [request=0/0b, fielddata=0/0b, in_flight_requests=4406/4.3kb, accounting=708708/692kb]]; ","allocation_status":"no_attempt"}}
- How can I avoid these errors?
- Do these errors take into account the number of allocation retries? Because I have set the max_retries to 20
Cluster info:
Version 7.7.1
9 data nodes
1 large index of 100 million docs (196 GB) - 21 shards (7 primary with 2 replicas)
The other indices are very small
JVM settings:
# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space
## Expert settings
## All settings below this section are considered
## expert settings. Don't tamper with them unless
## you understand what you are doing
## GC configuration
## G1GC Configuration
# NOTE: G1GC is only supported on JDK version 10 or later.
# To use G1GC uncomment the lines below.
# 10-:-XX:-UseConcMarkSweepGC
# 10-:-XX:-UseCMSInitiatingOccupancyOnly
# 10-:-XX:+UseG1GC
# 10-:-XX:InitiatingHeapOccupancyPercent=75
## DNS cache policy
# cache ttl in seconds for positive DNS lookups noting that this overrides the
# JDK security property networkaddress.cache.ttl; set to -1 to cache forever
# cache ttl in seconds for negative DNS lookups noting that this overrides the
# JDK security property networkaddress.cache.negative ttl; set to -1 to cache
# forever
## optimizations
# pre-touch memory pages used by the JVM during initialization
## basic
# explicitly set the stack size
# set to headless, just in case
# ensure UTF-8 encoding by default (e.g. filenames)
# use our provided JNA always versus the system one
# turn off a JDK optimization that throws away stack traces for common
# exceptions because stack traces are important for debugging
# flags to configure Netty
# log4j 2
## heap dumps
# generate a heap dump when an allocation from the Java heap fails
# heap dumps are created in the working directory of the JVM
# specify an alternative path for heap dumps; ensure the directory exists and
# has sufficient space
# specify an alternative path for JVM fatal error logs
## JDK 8 GC logging
# JDK 9+ GC logging
# due to internationalization enhancements in JDK 9 Elasticsearch need to set the provider to COMPAT otherwise
# time/date parsing will break in an incompatible way for some date patterns and locals
# temporary workaround for C2 bug with JDK 10 on hardware with AVX-512