Kibana and Elasticsearch Keep crashing

upendra · October 12, 2016, 3:21pm

HI,

Our elastic search cluster and Kibana keep crashing when we execute reports. The following are the product versions:

Logstash: 2.4.0
Elasticsearch: 2.4.0
Kibana: 4.6.1
Java: 1.8.0

The following is the error that we get:

Our cluster design is as follows:

Logstash Inputs: 4
Logstash output: 1
ES Master & Data: 5 ( Each one is both master and Data)
ES Client node (with Kibana): 1

ELK cluster is on Centos 7 each with 16 GB RAM. Out of which 4GB is allotted ES_HEAP_SIZE parameter.

We have also tried setting the Node Option parameter to:

exec "{NODE}" --max-old-space-size=100 "{DIR}/src/cli" ${@}

But still our Elasticsearch and Kibana keep crashing.

Thanks,
Upendra

warkolm · October 14, 2016, 10:03am

What's in your logs?

upendra · October 14, 2016, 10:55am

Hi Mark,

Running command jmap -heap pid gives the following:

Attaching to process ID 15144, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.101-b13

using parallel threads in the new generation.
using thread-local object allocation.
Concurrent Mark-Sweep GC

Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 8589934592 (8192.0MB)
NewSize = 348913664 (332.75MB)
MaxNewSize = 348913664 (332.75MB)
OldSize = 8241020928 (7859.25MB)
NewRatio = 2
SurvivorRatio = 8
MetaspaceSize = 21807104 (20.796875MB)
CompressedClassSpaceSize = 1073741824 (1024.0MB)
MaxMetaspaceSize = 17592186044415 MB
G1HeapRegionSize = 0 (0.0MB)

Heap Usage:
New Generation (Eden + 1 Survivor Space):
capacity = 314048512 (299.5MB)
used = 314048496 (299.49998474121094MB)
free = 16 (1.52587890625E-5MB)
99.99999490524573% used
Eden Space:
capacity = 279183360 (266.25MB)
used = 279183360 (266.25MB)
free = 0 (0.0MB)
100.0% used
From Space:
capacity = 34865152 (33.25MB)
used = 34865136 (33.24998474121094MB)
free = 16 (1.52587890625E-5MB)
99.99995410890507% used
To Space:
capacity = 34865152 (33.25MB)
used = 0 (0.0MB)
free = 34865152 (33.25MB)
0.0% used
concurrent mark-sweep generation:
capacity = 8241020928 (7859.25MB)
used = 8241020896 (7859.249969482422MB)
free = 32 (3.0517578125E-5MB)
99.9999996116986% used

15745 interned Strings occupying 2446000 bytes.

Thanks,
Upendra

upendra · October 14, 2016, 11:01am

Hi Mark,

Please see the logs. i am unable to send you complete logs due to space constraints of this forum.

Thanks,
Upendra

spinscale · October 14, 2016, 11:21am

Hey,

please use gist or other pastebins to put some logs somewhere (also make sure they dont contain sensitive information) - and keep the format as text. Thanks!

--Alex

upendra · October 14, 2016, 12:11pm

Thanks Alex for that help.

Please find the log entry here :

gist.github.com

https://gist.github.com/SravanTurbo/fd5222750d733ee0ab88aa90aad80d10

elasticsearch_error.txt

[2016-10-14 16:14:08,736][WARN ][transport                ] [arlmselk02_M!D!] Transport response handler not found of id [392538]
[2016-10-14 16:14:08,740][WARN ][monitor.jvm              ] [arlmselk02_M!D!] [gc][old][67284][32] duration [30s], collections [1]/[30.1s], total [30s]/[13.8m], memory [7.9gb]->[7.9gb]/[7.9gb], all_pools {[young] [266.2mb]->[266.2mb]/[266.2mb]}{[survivor] [30.9mb]->[32.6mb]/[33.2mb]}{[old] [7.6gb]->[7.6gb]/[7.6gb]}
[2016-10-14 16:14:51,912][WARN ][transport                ] [arlmselk02_M!D!] Transport response handler not found of id [392547]
[2016-10-14 16:15:22,077][WARN ][transport                ] [arlmselk02_M!D!] Transport response handler not found of id [392532]
[2016-10-14 16:15:22,077][WARN ][transport                ] [arlmselk02_M!D!] Transport response handler not found of id [392539]
[2016-10-14 16:15:51,794][WARN ][transport                ] [arlmselk02_M!D!] Received response for a request that has timed out, sent [71197ms] ago, timed out [30355ms] ago, action [cluster:monitor/nodes/stats[n]], node [{arlmselk06_M$D$}{dbO25mC-SEq0zTAz8qC_2g}{192.168.xxx.xxx}{192.168.xxx.xxx:9300}], id [392528]
[2016-10-14 16:16:53,484][WARN ][transport                ] [arlmselk02_M!D!] Transport response handler not found of id [392350]
[2016-10-14 16:16:53,485][WARN ][transport                ] [arlmselk02_M!D!] Transport response handler not found of id [392533]
[2016-10-14 16:18:31,514][ERROR][watcher.input.http       ] [arlmselk02_M!D!] failed to execute [http] input for [org.elasticsearch.watcher.watch.Watch@5efed351]
ElasticsearchTimeoutException[failed to execute http request. timeout expired]; nested: SocketTimeoutException[Read timed out];

This file has been truncated. show original

Regards,
Upendra

spinscale · October 14, 2016, 12:29pm

If you read that log, you can spot an out of memory exception. This means you have to restart your node immediately, as the behaviour after such an exception is not specified (you just dont know if everything works or not).

However in order to prevent those issues in the future, you should find out what triggers this exception. Is it a special query?

You might want to read the following docs regarding to that topic

You can use the cat APIs or monitoring to see if you have continously rising memory usages or spikes which cause this behaviour.

Hope this helps.

--Alex

warkolm · October 15, 2016, 2:30am

You should definitely be using Marvel as well.

Topic		Replies	Views
Kibana crashes Elasticsearch Kibana	8	1920	July 6, 2017
Elasticsearch plugin status RED Kibana	2	277	November 26, 2020
Elastic won't start - Java Heap Elasticsearch	1	918	November 18, 2018
Elastic crash Elasticsearch	15	4195	March 2, 2022
Kibana process crashes - high memory usage Kibana	2	466	December 13, 2022

Kibana and Elasticsearch Keep crashing

Related topics