Kibana and Elasticsearch Keep crashing

HI,

Our elastic search cluster and Kibana keep crashing when we execute reports. The following are the product versions:

  1. Logstash: 2.4.0
  2. Elasticsearch: 2.4.0
  3. Kibana: 4.6.1
  4. Java: 1.8.0

The following is the error that we get:

Our cluster design is as follows:

  1. Logstash Inputs: 4
  2. Logstash output: 1
  3. ES Master & Data: 5 ( Each one is both master and Data)
  4. ES Client node (with Kibana): 1

ELK cluster is on Centos 7 each with 16 GB RAM. Out of which 4GB is allotted ES_HEAP_SIZE parameter.

We have also tried setting the Node Option parameter to:

exec "{NODE}" --max-old-space-size=100 "{DIR}/src/cli" ${@}

But still our Elasticsearch and Kibana keep crashing.

Thanks,
Upendra

What's in your logs?

Hi Mark,

Running command jmap -heap pid gives the following:

Attaching to process ID 15144, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.101-b13

using parallel threads in the new generation.
using thread-local object allocation.
Concurrent Mark-Sweep GC

Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 8589934592 (8192.0MB)
NewSize = 348913664 (332.75MB)
MaxNewSize = 348913664 (332.75MB)
OldSize = 8241020928 (7859.25MB)
NewRatio = 2
SurvivorRatio = 8
MetaspaceSize = 21807104 (20.796875MB)
CompressedClassSpaceSize = 1073741824 (1024.0MB)
MaxMetaspaceSize = 17592186044415 MB
G1HeapRegionSize = 0 (0.0MB)

Heap Usage:
New Generation (Eden + 1 Survivor Space):
capacity = 314048512 (299.5MB)
used = 314048496 (299.49998474121094MB)
free = 16 (1.52587890625E-5MB)
99.99999490524573% used
Eden Space:
capacity = 279183360 (266.25MB)
used = 279183360 (266.25MB)
free = 0 (0.0MB)
100.0% used
From Space:
capacity = 34865152 (33.25MB)
used = 34865136 (33.24998474121094MB)
free = 16 (1.52587890625E-5MB)
99.99995410890507% used
To Space:
capacity = 34865152 (33.25MB)
used = 0 (0.0MB)
free = 34865152 (33.25MB)
0.0% used
concurrent mark-sweep generation:
capacity = 8241020928 (7859.25MB)
used = 8241020896 (7859.249969482422MB)
free = 32 (3.0517578125E-5MB)
99.9999996116986% used

15745 interned Strings occupying 2446000 bytes.

Thanks,
Upendra

Thanks,
Upendra

Hi Mark,

Please see the logs. i am unable to send you complete logs due to space constraints of this forum.

Thanks,
Upendra

Hey,

please use gist or other pastebins to put some logs somewhere (also make sure they dont contain sensitive information) - and keep the format as text. Thanks!

--Alex

Thanks Alex for that help.

Please find the log entry here :

Regards,
Upendra

If you read that log, you can spot an out of memory exception. This means you have to restart your node immediately, as the behaviour after such an exception is not specified (you just dont know if everything works or not).

However in order to prevent those issues in the future, you should find out what triggers this exception. Is it a special query?

You might want to read the following docs regarding to that topic

You can use the cat APIs or monitoring to see if you have continously rising memory usages or spikes which cause this behaviour.

Hope this helps.

--Alex

You should definitely be using Marvel as well.