OutOfMemoryError occurred on ES host accessed via Kibana

Kinda similar to https://github.com/elastic/elasticsearch/issues/49699 in that I have a significant amount of my heap being consumed by byte arrays, seemingly being utilized by the transport_worker threads. Seemingly it only occurs when I'm using it, and I've noticed it happens when I view monitoring data for extended periods and/or am using the dev console to access the ES API.

I believe this started occurring when I upgrade to 7.8; I utilize SearchGuard for HTTPS and authentication, which I know is a factor, but it's not something I'm able to disable on just a single node, I'd have to turn it off on the full cluster and do a significant amount of work reconfiguring all of my beats and endpoints. If the culprit and be narrowed down without taking those steps it'd be very much appreciated.

With a 10GB heapdump on OOM:

byte 2,577,764 (32.1%) 8,564,854,966 B (96.9%) n/a

"elasticsearch[ES-WEB-01][transport_worker][T#1]" daemon prio=5 tid=29 RUNNABLE
at java.lang.OutOfMemoryError.(OutOfMemoryError.java:49)
at io.netty.util.internal.PlatformDependent.allocateUninitializedArray(PlatformDependent.java:281)
at io.netty.buffer.PoolArena$HeapArena.newByteArray(PoolArena.java:662)
at io.netty.buffer.PoolArena$HeapArena.newChunk(PoolArena.java:672)
local variable: io.netty.buffer.PoolChunk#225
at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:247)
local variable: io.netty.buffer.PoolThreadCache#2
at io.netty.buffer.PoolArena.allocate(PoolArena.java:227)
local variable: io.netty.buffer.PoolArena$HeapArena#14
at io.netty.buffer.PoolArena.allocate(PoolArena.java:147)
local variable: io.netty.buffer.PooledHeapByteBuf#107207
at io.netty.buffer.PooledByteBufAllocator.newHeapBuffer(PooledByteBufAllocator.java:339)
at io.netty.buffer.AbstractByteBufAllocator.heapBuffer(AbstractByteBufAllocator.java:168)
at io.netty.buffer.AbstractByteBufAllocator.heapBuffer(AbstractByteBufAllocator.java:159)
at org.elasticsearch.transport.NettyAllocator$NoDirectBuffers.heapBuffer(NettyAllocator.java:137)
at io.netty.handler.ssl.SslHandler$SslEngineType$3.allocateWrapBuffer(SslHandler.java:312)
at io.netty.handler.ssl.SslHandler.allocateOutNetBuf(SslHandler.java:2199)
at io.netty.handler.ssl.SslHandler.wrap(SslHandler.java:840)
local variable: io.netty.channel.DefaultChannelPromise#108542
local variable: io.netty.buffer.UnpooledSlicedByteBuf#11
at io.netty.handler.ssl.SslHandler.wrapAndFlush(SslHandler.java:811)
at io.netty.handler.ssl.SslHandler.flush(SslHandler.java:792)
local variable: io.netty.handler.ssl.SslHandler#203
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750)
local variable: io.netty.channel.DefaultChannelHandlerContext#773
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:742)
at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:728)
at io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:127)
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750)
local variable: io.netty.channel.DefaultChannelHandlerContext#780
at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:765)
at io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1071)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
local variable: io.netty.channel.AbstractChannelHandlerContext$WriteTask#1
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
local variable: io.netty.channel.nio.NioEventLoop#1
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
local variable: io.netty.util.concurrent.SingleThreadEventExecutor$4#2
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
local variable: io.netty.util.internal.ThreadExecutorMap$2#2
at java.lang.Thread.run(Thread.java:832)

I've tried increasing the heap all the way up to 24GB, but the problem still eventually shows up.

Please let me know the next steps for troubleshooting this; I've reached the end of my capabilities for troubleshooting this on my own.

Thanks,
Sam N.

elasticsearch.yml - Searchguard items omitted

bootstrap.memory_lock: true
cluster.initial_master_nodes:
  - Elasticsearch-01
  - Elasticsearch-02
  - Elasticsearch-03
cluster.name: ELK
discovery.seed_hosts:
  - Elasticsearch-01
  - Elasticsearch-02
  - Elasticsearch-03
http.port: 9200
network.host: Kibana.my.domain
node.data: false
node.ingest: false
node.master: false
#node.voting_only: true
node.transform: false
node.remote_cluster_client: false
node.max_local_storage_nodes: 1
node.name: Kibana
path.data: C:\ELK\Elasticsearch\Data
path.logs: C:\ELK\Elasticsearch\Logs
transport.tcp.port: 9300
xpack.license.self_generated.type: basic
xpack.security.enabled: false
gateway.recover_after_master_nodes: 2
# No default
#indices.fielddata.cache.size: 1%
# Default to 10%, 1% because no indexing occurs here
#indices.memory.index_buffer_size: 1%
#indices.queries.cache.size: 1%
indices.query.bool.max_clause_count: 8192
indices.recovery.max_bytes_per_sec: 500mb
#indices.requests.cache.size: 1%
#indices.breaker.total.limit: 95%
# Default 40%
indices.breaker.fielddata.limit: 20%
# Default 60%
#indices.breaker.request.limit: 65%
# Default 100%
#network.breaker.inflight_requests.limit: 75%
# Default 100%
#indices.breaker.accounting.limit: 75%
node.ml: false
#search.max_buckets: 100000
#thread_pool.write.queue_size: 1000
transport.port: 9300
xpack.graph.enabled: false
xpack.logstash.enabled: false
xpack.ml.enabled: false
xpack.watcher.enabled: false
cluster.fault_detection.follower_check.timeout: 15s
cluster.fault_detection.leader_check.timeout: 15s
cluster.fault_detection.leader_check.interval: 2s
cluster.publish.info_timeout: 15s
signals.enabled: false

Elasticsearch jvm.options

###
# Memory
###
#-Xmx6143m
#-Xms6143m
###
# My Changes
###
-Djava.net.preferIPv4Stack=true
-Dcom.sun.management.jmxremote=true
-Djava.rmi.server.hostname=Kibana.my.domain
-Dcom.sun.management.jmxremote.port=8999
-Dcom.sun.management.jmxremote.rmi.port=8999
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-XX:ErrorFile=C:\ELK\Elasticsearch\Logs\Fatal_Error.log
#-Des.transport.cname_in_publish_address=true
###
# Garbage Collection
###
#-XX:+UseConcMarkSweepGC
#-XX:-UseConcMarkSweepGC
#-XX:-UseCMSInitiatingOccupancyOnly
# -XX:+UseG1GC
# -XX:InitiatingHeapOccupancyPercent=75
-XX:+UseG1GC
-XX:G1ReservePercent=25
-XX:InitiatingHeapOccupancyPercent=30
###
# OEM Settings
###
## JVM temporary directory
-Djava.io.tmpdir=${ES_TMPDIR}
## heap dumps
# generate a heap dump when an allocation from the Java heap fails
# heap dumps are created in the working directory of the JVM
#-XX:+HeapDumpOnOutOfMemoryError
# specify an alternative path for heap dumps; ensure the directory exists and
# has sufficient space
#${heap.dump.path}
# specify an alternative path for JVM fatal error logs
#${error.file}
## GC logging
-Xmx24g
-Xms24g

Kibana.yml

logging.dest: C:\ELK\Kibana\Logs\Kibana.log
logging.timezone: "America/Chicago"
logging.rotate.enabled: true
#logging.level: debug
#logging.verbose: true
server.host: "0.0.0.0"
server.name: "Kibana.my.domain"
monitoring.enabled: true
monitoring.ui.elasticsearch.hosts:
  - https://Kibana.my.domain:9200
monitoring.kibana.collection.enabled: false
monitoring.elasticsearch.ssl.certificateAuthorities: "C:\\ELK\\Certificates\\es-index-chain.pem"
monitoring.ui.elasticsearch.ssl.verificationMode: none
monitoring.ui.elasticsearch.username: monitoring
monitoring.ui.elasticsearch.password: '[REMOVED]'
# 10 Minutes reporting timeout
xpack.reporting.queue.timeout: 1800000
# ~104MB
xpack.reporting.csv.maxSizeBytes: 304857600
xpack.encryptedSavedObjects.encryptionKey: '[REMOVED]'

elasticsearch.hosts: 
  - "https://Kibana.my.domain:9200"
elasticsearch.username: "kibanaserver"
elasticsearch.password: "[REMOVED]"
elasticsearch.ssl.verificationMode: none
elasticsearch.ssl.certificateAuthorities: "C:\\ELK\\Kibana\\Config\\CA.pem"
kibana.autocompleteTerminateAfter: 50000
elasticsearch.requestTimeout: 90000
elasticsearch.shardTimeout: 90000

# Default to session cookie
searchguard.cookie.ttl: 0
# Increase session TTL to 8 hours (in ms)
# 8 hours 28800000
# 16 hours 57600000
searchguard.session.ttl: 57600000
# choose a non-default encryption password for cookies
searchguard.cookie.password: '[REMOVED]'
# For updating settings

If you don't see an option above assume it's the default.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.