Kinda similar to OutOfMemoryError occurred in coordinating node · Issue #49699 · elastic/elasticsearch · GitHub in that I have a significant amount of my heap being consumed by byte arrays, seemingly being utilized by the transport_worker threads. Seemingly it only occurs when I'm using it, and I've noticed it happens when I view monitoring data for extended periods and/or am using the dev console to access the ES API.
I believe this started occurring when I upgrade to 7.8; I utilize SearchGuard for HTTPS and authentication, which I know is a factor, but it's not something I'm able to disable on just a single node, I'd have to turn it off on the full cluster and do a significant amount of work reconfiguring all of my beats and endpoints. If the culprit and be narrowed down without taking those steps it'd be very much appreciated.
With a 10GB heapdump on OOM:
byte 2,577,764 (32.1%) 8,564,854,966 B (96.9%) n/a
"elasticsearch[ES-WEB-01][transport_worker][T#1]" daemon prio=5 tid=29 RUNNABLE
at java.lang.OutOfMemoryError.(OutOfMemoryError.java:49)
at io.netty.util.internal.PlatformDependent.allocateUninitializedArray(PlatformDependent.java:281)
at io.netty.buffer.PoolArena$HeapArena.newByteArray(PoolArena.java:662)
at io.netty.buffer.PoolArena$HeapArena.newChunk(PoolArena.java:672)
local variable: io.netty.buffer.PoolChunk#225
at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:247)
local variable: io.netty.buffer.PoolThreadCache#2
at io.netty.buffer.PoolArena.allocate(PoolArena.java:227)
local variable: io.netty.buffer.PoolArena$HeapArena#14
at io.netty.buffer.PoolArena.allocate(PoolArena.java:147)
local variable: io.netty.buffer.PooledHeapByteBuf#107207
at io.netty.buffer.PooledByteBufAllocator.newHeapBuffer(PooledByteBufAllocator.java:339)
at io.netty.buffer.AbstractByteBufAllocator.heapBuffer(AbstractByteBufAllocator.java:168)
at io.netty.buffer.AbstractByteBufAllocator.heapBuffer(AbstractByteBufAllocator.java:159)
at org.elasticsearch.transport.NettyAllocator$NoDirectBuffers.heapBuffer(NettyAllocator.java:137)
at io.netty.handler.ssl.SslHandler$SslEngineType$3.allocateWrapBuffer(SslHandler.java:312)
at io.netty.handler.ssl.SslHandler.allocateOutNetBuf(SslHandler.java:2199)
at io.netty.handler.ssl.SslHandler.wrap(SslHandler.java:840)
local variable: io.netty.channel.DefaultChannelPromise#108542
local variable: io.netty.buffer.UnpooledSlicedByteBuf#11
at io.netty.handler.ssl.SslHandler.wrapAndFlush(SslHandler.java:811)
at io.netty.handler.ssl.SslHandler.flush(SslHandler.java:792)
local variable: io.netty.handler.ssl.SslHandler#203
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750)
local variable: io.netty.channel.DefaultChannelHandlerContext#773
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:742)
at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:728)
at io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:127)
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750)
local variable: io.netty.channel.DefaultChannelHandlerContext#780
at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:765)
at io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1071)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
local variable: io.netty.channel.AbstractChannelHandlerContext$WriteTask#1
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
local variable: io.netty.channel.nio.NioEventLoop#1
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
local variable: io.netty.util.concurrent.SingleThreadEventExecutor$4#2
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
local variable: io.netty.util.internal.ThreadExecutorMap$2#2
at java.lang.Thread.run(Thread.java:832)
I've tried increasing the heap all the way up to 24GB, but the problem still eventually shows up.
Please let me know the next steps for troubleshooting this; I've reached the end of my capabilities for troubleshooting this on my own.
Thanks,
Sam N.
elasticsearch.yml - Searchguard items omitted
bootstrap.memory_lock: true
cluster.initial_master_nodes:
- Elasticsearch-01
- Elasticsearch-02
- Elasticsearch-03
cluster.name: ELK
discovery.seed_hosts:
- Elasticsearch-01
- Elasticsearch-02
- Elasticsearch-03
http.port: 9200
network.host: Kibana.my.domain
node.data: false
node.ingest: false
node.master: false
#node.voting_only: true
node.transform: false
node.remote_cluster_client: false
node.max_local_storage_nodes: 1
node.name: Kibana
path.data: C:\ELK\Elasticsearch\Data
path.logs: C:\ELK\Elasticsearch\Logs
transport.tcp.port: 9300
xpack.license.self_generated.type: basic
xpack.security.enabled: false
gateway.recover_after_master_nodes: 2
# No default
#indices.fielddata.cache.size: 1%
# Default to 10%, 1% because no indexing occurs here
#indices.memory.index_buffer_size: 1%
#indices.queries.cache.size: 1%
indices.query.bool.max_clause_count: 8192
indices.recovery.max_bytes_per_sec: 500mb
#indices.requests.cache.size: 1%
#indices.breaker.total.limit: 95%
# Default 40%
indices.breaker.fielddata.limit: 20%
# Default 60%
#indices.breaker.request.limit: 65%
# Default 100%
#network.breaker.inflight_requests.limit: 75%
# Default 100%
#indices.breaker.accounting.limit: 75%
node.ml: false
#search.max_buckets: 100000
#thread_pool.write.queue_size: 1000
transport.port: 9300
xpack.graph.enabled: false
xpack.logstash.enabled: false
xpack.ml.enabled: false
xpack.watcher.enabled: false
cluster.fault_detection.follower_check.timeout: 15s
cluster.fault_detection.leader_check.timeout: 15s
cluster.fault_detection.leader_check.interval: 2s
cluster.publish.info_timeout: 15s
signals.enabled: false
Elasticsearch jvm.options
### # Memory ### #-Xmx6143m #-Xms6143m ### # My Changes ### -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote=true -Djava.rmi.server.hostname=Kibana.my.domain -Dcom.sun.management.jmxremote.port=8999 -Dcom.sun.management.jmxremote.rmi.port=8999 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -XX:ErrorFile=C:\ELK\Elasticsearch\Logs\Fatal_Error.log #-Des.transport.cname_in_publish_address=true ### # Garbage Collection ### #-XX:+UseConcMarkSweepGC #-XX:-UseConcMarkSweepGC #-XX:-UseCMSInitiatingOccupancyOnly # -XX:+UseG1GC # -XX:InitiatingHeapOccupancyPercent=75 -XX:+UseG1GC -XX:G1ReservePercent=25 -XX:InitiatingHeapOccupancyPercent=30 ### # OEM Settings ### ## JVM temporary directory -Djava.io.tmpdir=${ES_TMPDIR} ## heap dumps # generate a heap dump when an allocation from the Java heap fails # heap dumps are created in the working directory of the JVM #-XX:+HeapDumpOnOutOfMemoryError # specify an alternative path for heap dumps; ensure the directory exists and # has sufficient space #${heap.dump.path} # specify an alternative path for JVM fatal error logs #${error.file} ## GC logging -Xmx24g -Xms24g
Kibana.yml
logging.dest: C:\ELK\Kibana\Logs\Kibana.log
logging.timezone: "America/Chicago"
logging.rotate.enabled: true
#logging.level: debug
#logging.verbose: true
server.host: "0.0.0.0"
server.name: "Kibana.my.domain"
monitoring.enabled: true
monitoring.ui.elasticsearch.hosts:
- https://Kibana.my.domain:9200
monitoring.kibana.collection.enabled: false
monitoring.elasticsearch.ssl.certificateAuthorities: "C:\\ELK\\Certificates\\es-index-chain.pem"
monitoring.ui.elasticsearch.ssl.verificationMode: none
monitoring.ui.elasticsearch.username: monitoring
monitoring.ui.elasticsearch.password: '[REMOVED]'
# 10 Minutes reporting timeout
xpack.reporting.queue.timeout: 1800000
# ~104MB
xpack.reporting.csv.maxSizeBytes: 304857600
xpack.encryptedSavedObjects.encryptionKey: '[REMOVED]'
elasticsearch.hosts:
- "https://Kibana.my.domain:9200"
elasticsearch.username: "kibanaserver"
elasticsearch.password: "[REMOVED]"
elasticsearch.ssl.verificationMode: none
elasticsearch.ssl.certificateAuthorities: "C:\\ELK\\Kibana\\Config\\CA.pem"
kibana.autocompleteTerminateAfter: 50000
elasticsearch.requestTimeout: 90000
elasticsearch.shardTimeout: 90000
# Default to session cookie
searchguard.cookie.ttl: 0
# Increase session TTL to 8 hours (in ms)
# 8 hours 28800000
# 16 hours 57600000
searchguard.session.ttl: 57600000
# choose a non-default encryption password for cookies
searchguard.cookie.password: '[REMOVED]'
# For updating settings
If you don't see an option above assume it's the default.