OutOfMemoryError occurred on ES host accessed via Kibana

novaksam · September 25, 2020, 4:37pm

Kinda similar to OutOfMemoryError occurred in coordinating node · Issue #49699 · elastic/elasticsearch · GitHub in that I have a significant amount of my heap being consumed by byte arrays, seemingly being utilized by the transport_worker threads. Seemingly it only occurs when I'm using it, and I've noticed it happens when I view monitoring data for extended periods and/or am using the dev console to access the ES API.

I believe this started occurring when I upgrade to 7.8; I utilize SearchGuard for HTTPS and authentication, which I know is a factor, but it's not something I'm able to disable on just a single node, I'd have to turn it off on the full cluster and do a significant amount of work reconfiguring all of my beats and endpoints. If the culprit and be narrowed down without taking those steps it'd be very much appreciated.

With a 10GB heapdump on OOM:

byte 2,577,764 (32.1%) 8,564,854,966 B (96.9%) n/a

"elasticsearch[ES-WEB-01][transport_worker][T#1]" daemon prio=5 tid=29 RUNNABLE
at java.lang.OutOfMemoryError.(OutOfMemoryError.java:49)
at io.netty.util.internal.PlatformDependent.allocateUninitializedArray(PlatformDependent.java:281)
at io.netty.buffer.PoolArena$HeapArena.newByteArray(PoolArena.java:662)
at io.netty.buffer.PoolArena$HeapArena.newChunk(PoolArena.java:672)
local variable: io.netty.buffer.PoolChunk#225
at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:247)
local variable: io.netty.buffer.PoolThreadCache#2
at io.netty.buffer.PoolArena.allocate(PoolArena.java:227)
local variable: io.netty.buffer.PoolArena$HeapArena#14
at io.netty.buffer.PoolArena.allocate(PoolArena.java:147)
local variable: io.netty.buffer.PooledHeapByteBuf#107207
at io.netty.buffer.PooledByteBufAllocator.newHeapBuffer(PooledByteBufAllocator.java:339)
at io.netty.buffer.AbstractByteBufAllocator.heapBuffer(AbstractByteBufAllocator.java:168)
at io.netty.buffer.AbstractByteBufAllocator.heapBuffer(AbstractByteBufAllocator.java:159)
at org.elasticsearch.transport.NettyAllocator$NoDirectBuffers.heapBuffer(NettyAllocator.java:137)
at io.netty.handler.ssl.SslHandler$SslEngineType$3.allocateWrapBuffer(SslHandler.java:312)
at io.netty.handler.ssl.SslHandler.allocateOutNetBuf(SslHandler.java:2199)
at io.netty.handler.ssl.SslHandler.wrap(SslHandler.java:840)
local variable: io.netty.channel.DefaultChannelPromise#108542
local variable: io.netty.buffer.UnpooledSlicedByteBuf#11
at io.netty.handler.ssl.SslHandler.wrapAndFlush(SslHandler.java:811)
at io.netty.handler.ssl.SslHandler.flush(SslHandler.java:792)
local variable: io.netty.handler.ssl.SslHandler#203
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750)
local variable: io.netty.channel.DefaultChannelHandlerContext#773
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:742)
at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:728)
at io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:127)
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750)
local variable: io.netty.channel.DefaultChannelHandlerContext#780
at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:765)
at io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1071)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
local variable: io.netty.channel.AbstractChannelHandlerContext$WriteTask#1
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
local variable: io.netty.channel.nio.NioEventLoop#1
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
local variable: io.netty.util.concurrent.SingleThreadEventExecutor$4#2
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
local variable: io.netty.util.internal.ThreadExecutorMap$2#2
at java.lang.Thread.run(Thread.java:832)

I've tried increasing the heap all the way up to 24GB, but the problem still eventually shows up.

Please let me know the next steps for troubleshooting this; I've reached the end of my capabilities for troubleshooting this on my own.

Thanks,
Sam N.

elasticsearch.yml - Searchguard items omitted

bootstrap.memory_lock: true
cluster.initial_master_nodes:
  - Elasticsearch-01
  - Elasticsearch-02
  - Elasticsearch-03
cluster.name: ELK
discovery.seed_hosts:
  - Elasticsearch-01
  - Elasticsearch-02
  - Elasticsearch-03
http.port: 9200
network.host: Kibana.my.domain
node.data: false
node.ingest: false
node.master: false
#node.voting_only: true
node.transform: false
node.remote_cluster_client: false
node.max_local_storage_nodes: 1
node.name: Kibana
path.data: C:\ELK\Elasticsearch\Data
path.logs: C:\ELK\Elasticsearch\Logs
transport.tcp.port: 9300
xpack.license.self_generated.type: basic
xpack.security.enabled: false
gateway.recover_after_master_nodes: 2
# No default
#indices.fielddata.cache.size: 1%
# Default to 10%, 1% because no indexing occurs here
#indices.memory.index_buffer_size: 1%
#indices.queries.cache.size: 1%
indices.query.bool.max_clause_count: 8192
indices.recovery.max_bytes_per_sec: 500mb
#indices.requests.cache.size: 1%
#indices.breaker.total.limit: 95%
# Default 40%
indices.breaker.fielddata.limit: 20%
# Default 60%
#indices.breaker.request.limit: 65%
# Default 100%
#network.breaker.inflight_requests.limit: 75%
# Default 100%
#indices.breaker.accounting.limit: 75%
node.ml: false
#search.max_buckets: 100000
#thread_pool.write.queue_size: 1000
transport.port: 9300
xpack.graph.enabled: false
xpack.logstash.enabled: false
xpack.ml.enabled: false
xpack.watcher.enabled: false
cluster.fault_detection.follower_check.timeout: 15s
cluster.fault_detection.leader_check.timeout: 15s
cluster.fault_detection.leader_check.interval: 2s
cluster.publish.info_timeout: 15s
signals.enabled: false

Elasticsearch jvm.options

###
# Memory
###
#-Xmx6143m
#-Xms6143m
###
# My Changes
###
-Djava.net.preferIPv4Stack=true
-Dcom.sun.management.jmxremote=true
-Djava.rmi.server.hostname=Kibana.my.domain
-Dcom.sun.management.jmxremote.port=8999
-Dcom.sun.management.jmxremote.rmi.port=8999
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-XX:ErrorFile=C:\ELK\Elasticsearch\Logs\Fatal_Error.log
#-Des.transport.cname_in_publish_address=true
###
# Garbage Collection
###
#-XX:+UseConcMarkSweepGC
#-XX:-UseConcMarkSweepGC
#-XX:-UseCMSInitiatingOccupancyOnly
# -XX:+UseG1GC
# -XX:InitiatingHeapOccupancyPercent=75
-XX:+UseG1GC
-XX:G1ReservePercent=25
-XX:InitiatingHeapOccupancyPercent=30
###
# OEM Settings
###
## JVM temporary directory
-Djava.io.tmpdir=${ES_TMPDIR}
## heap dumps
# generate a heap dump when an allocation from the Java heap fails
# heap dumps are created in the working directory of the JVM
#-XX:+HeapDumpOnOutOfMemoryError
# specify an alternative path for heap dumps; ensure the directory exists and
# has sufficient space
#${heap.dump.path}
# specify an alternative path for JVM fatal error logs
#${error.file}
## GC logging
-Xmx24g
-Xms24g

Kibana.yml

logging.dest: C:\ELK\Kibana\Logs\Kibana.log
logging.timezone: "America/Chicago"
logging.rotate.enabled: true
#logging.level: debug
#logging.verbose: true
server.host: "0.0.0.0"
server.name: "Kibana.my.domain"
monitoring.enabled: true
monitoring.ui.elasticsearch.hosts:
  - https://Kibana.my.domain:9200
monitoring.kibana.collection.enabled: false
monitoring.elasticsearch.ssl.certificateAuthorities: "C:\\ELK\\Certificates\\es-index-chain.pem"
monitoring.ui.elasticsearch.ssl.verificationMode: none
monitoring.ui.elasticsearch.username: monitoring
monitoring.ui.elasticsearch.password: '[REMOVED]'
# 10 Minutes reporting timeout
xpack.reporting.queue.timeout: 1800000
# ~104MB
xpack.reporting.csv.maxSizeBytes: 304857600
xpack.encryptedSavedObjects.encryptionKey: '[REMOVED]'

elasticsearch.hosts: 
  - "https://Kibana.my.domain:9200"
elasticsearch.username: "kibanaserver"
elasticsearch.password: "[REMOVED]"
elasticsearch.ssl.verificationMode: none
elasticsearch.ssl.certificateAuthorities: "C:\\ELK\\Kibana\\Config\\CA.pem"
kibana.autocompleteTerminateAfter: 50000
elasticsearch.requestTimeout: 90000
elasticsearch.shardTimeout: 90000

# Default to session cookie
searchguard.cookie.ttl: 0
# Increase session TTL to 8 hours (in ms)
# 8 hours 28800000
# 16 hours 57600000
searchguard.session.ttl: 57600000
# choose a non-default encryption password for cookies
searchguard.cookie.password: '[REMOVED]'
# For updating settings

If you don't see an option above assume it's the default.

system · October 23, 2020, 4:37pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
java.lang.OutOfMemoryError Kubernetes GKE Elasticsearch	2	1622	February 12, 2018
Out of memory happens in elasticsearch node when Kibana connected Kibana	5	739	August 16, 2019
java.lang.OutOfMemoryError Elasticsearch	2	1967	July 5, 2017
Kibana down with Error saying out of memory Kibana	2	1434	October 2, 2019
Kibana and Elasticsearch Keep crashing Kibana	8	4319	July 6, 2017

OutOfMemoryError occurred on ES host accessed via Kibana

Related topics