Query from Uptime app causes OutOfMemory exception in Elasticsearch

Mattias_Arbin · June 18, 2019, 9:01am

I have a three node ELK cluster with 500M documents in 900 indices. The cluster has been running without any memory-related issues for over a year.
I recently started using heartbeat and the Uptime app.
Version is 6.8.0.
I suddenly started getting OutOfMemory-exceptions that crashed one of the elasticsearch nodes. After some investigations, it is clear that this happens when you click on a link in the Error list table. See screenshot. I get a crash everytime I hit the top row in the table.

uptime_issue_2

I have heartbeat data for only 15 days. 2880 heartbeats per day in daily indices. 1 primary and one replica per index.

Mattias_Arbin · June 18, 2019, 9:02am

Here is the stacktrace, that does not say a lot

2019-06-17T14:30:21,198][WARN ][o.e.m.j.JvmGcMonitorService] [ow500logan02] [gc][884] overhead, spent [3.6s] collecting in the last [3.6s]
[2019-06-17T14:30:25,980][WARN ][o.e.m.j.JvmGcMonitorService] [ow500logan02] [gc][885] overhead, spent [4.7s] collecting in the last [4.7s]
[2019-06-17T14:31:44,943][ERROR][o.e.x.m.c.n.NodeStatsCollector] [ow500logan02] collector [node_stats] timed out when collecting data
[2019-06-17T14:31:45,151][WARN ][o.e.m.j.JvmGcMonitorService] [ow500logan02] [gc][886] overhead, spent [59s] collecting in the last [1.3m]
[2019-06-17T14:31:45,615][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [ow500logan02] fatal error in thread [elasticsearch[ow500logan02][search][T#6]], exiting
java.lang.OutOfMemoryError: Java heap space
at org.elasticsearch.common.util.AbstractBigArray.newBytePage(AbstractBigArray.java:120) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.common.util.BigByteArray.<init>(BigByteArray.java:46) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.common.util.BigArrays.newByteArray(BigArrays.java:467) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.common.util.BigArrays.newByteArray(BigArrays.java:481) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.<init>(HyperLogLogPlusPlus.java:176) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.search.aggregations.metrics.cardinality.InternalCardinality.doReduce(InternalCardinality.java:90) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.search.aggregations.InternalAggregation.reduce(InternalAggregation.java:135) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.search.aggregations.InternalAggregations.reduce(InternalAggregations.java:128) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.search.aggregations.InternalAggregations.reduce(InternalAggregations.java:96) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.search.aggregations.bucket.histogram.InternalAutoDateHistogram$Bucket.reduce(InternalAutoDateHistogram.java:131) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.search.aggregations.bucket.histogram.InternalAutoDateHistogram.reduceBuckets(InternalAutoDateHistogram.java:338) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.search.aggregations.bucket.histogram.InternalAutoDateHistogram.doReduce(InternalAutoDateHistogram.java:500) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.search.aggregations.InternalAggregation.reduce(InternalAggregation.java:135) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.search.aggregations.InternalAggregations.reduce(InternalAggregations.java:128) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.action.search.SearchPhaseController.reducedQueryPhase(SearchPhaseController.java:497) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.action.search.SearchPhaseController.reducedQueryPhase(SearchPhaseController.java:412) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.action.search.SearchPhaseController$1.reduce(SearchPhaseController.java:699) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.action.search.FetchSearchPhase.innerRun(FetchSearchPhase.java:101) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.action.search.FetchSearchPhase.access$000(FetchSearchPhase.java:44) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.action.search.FetchSearchPhase$1.doRun(FetchSearchPhase.java:86) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.8.0.jar:6.8.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_191]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]

Mattias_Arbin · June 18, 2019, 9:05am

Start-up log. Showing java version and JVM arguments

58,099][INFO ][o.e.n.Node ] [ow500logan02] version[6.8.0], pid[82954], build[default/rpm/65b6179/2019-05-15T20:06:13.172855Z], OS[Linux/3.10.0-
957.el7.x86_64/amd64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_191/25.191-b12]
[2019-06-17T14:14:58,099][INFO ][o.e.n.Node ] [ow500logan02] JVM arguments [-Xms8g, -Xmx8g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+
UseCMSInitiatingOccupancyOnly, -Des.networkaddress.cache.ttl=60, -Des.networkaddress.cache.negative.ttl=10, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encod
ing=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4
j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.io.tmpdir=/tmp/elasticsearch-2270712997360736217, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=/opt/logan/e
lk/elasticsearch/logs, -XX:ErrorFile=/opt/logan/elk/elasticsearch/logs/hs_err_pid%p.log, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/etc/elasticsearch, -Des.distr
ibution.flavor=default, -Des.distribution.type=rpm]

harshbajaj16 · June 18, 2019, 11:22am

Hi @Mattias_Arbin,

In provided logs its showing "Heap out of memory" error. You need to check the heap space and need to resize the parameter.

Please find below link related to this

Also, please read below links as you have a big size cluster environment with 900 indices.

Regards,
Harsh Bajaj

Mattias_Arbin · June 18, 2019, 11:50am

Thanks for the advice.
It is possible that increasing heap size would help, but again, this cluster has been running for months without any memory-related issues. I just find it strange that queries against the fairly small heartbeat indices would cause a general heap shortage.
I suspect that this is more of a memory leak that consumes any available heap memory in no time.

I might try raising the heap size and see if I am right..

Mattias_Arbin · June 19, 2019, 8:24am

The screenshot below pretty much explains what I mean. At 09.47 I opened the Uptime app in Kibana. CPU and heap allocation goes straight up and then the node crashes.

system · July 13, 2019, 8:24am

This topic was automatically closed 24 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch 6.6.2 constantly failing with Out Of Memory Errors Elasticsearch	22	10490	April 23, 2019
All data nodes died on the cluster Elasticsearch	7	1600	April 17, 2019
Unexplained failure java.lang.OutOfMemoryError: Java heap space Elasticsearch	15	1151	May 21, 2021
java.lang.OutOfMemoryError Elasticsearch	2	1967	July 5, 2017
Outofmemory exception in elasticsearch Elasticsearch	2	445	April 20, 2020

Query from Uptime app causes OutOfMemory exception in Elasticsearch

Related topics