ES process dies after some minutes of searching


(Hans) #1

Dear Community,

I am not sure if I am right here with my question. Hopefully I am.

My environment ELK stack ( elasticsearch, logstash and kibana ) all version 6.6.1 running on Debian 3.16.43-2+deb8u5 with java version "1.8.0_202". Server runs on VMware with 16 GB memory and 4 cores.

I installed elastiflow 3.4.1 and sending "netflow" datagrams from a Cisco perimeter device. Kibana shows me really nice and fancy diagrams and images. This goes well as long as I use the dashboard for a time period of "last 15 minutes". If I change to "last 4 hours" and do some queries I can see very soon some timeout messages on the GUI. And waiting for about one or two minutes I have to realize that the elasticsearch process died.

This is the end of the log file:

2019-03-12T13:06:26.373+0100: 13050.808: Total time for which application threads were stopped: 1.2375382 seconds, Stopping threads took: 0.0020663 seconds
2019-03-12T13:06:26.375+0100: 13050.810: [GC (CMS Initial Mark) [1 CMS-initial-mark: 707839K(707840K)] 1013663K(1014528K), 0.2185484 secs] [Times: user=0.23 sys=0.00, real=0.22 secs]
2019-03-12T13:06:26.594+0100: 13051.029: Total time for which application threads were stopped: 0.2191069 seconds, Stopping threads took: 0.0000805 seconds
2019-03-12T13:06:26.594+0100: 13051.029: [CMS-concurrent-mark-start]
Heap
par new generation total 306688K, used 306688K [0x00000000c0000000, 0x00000000d4cc0000, 0x00000000d4cc0000)
eden space 272640K, 100% used [0x00000000c0000000, 0x00000000d0a40000, 0x00000000d0a40000)
from space 34048K, 100% used [0x00000000d0a40000, 0x00000000d2b80000, 0x00000000d2b80000)
to space 34048K, 0% used [0x00000000d2b80000, 0x00000000d2b80000, 0x00000000d4cc0000)
concurrent mark-sweep generation total 707840K, used 707839K [0x00000000d4cc0000, 0x0000000100000000, 0x0000000100000000)
Metaspace used 85552K, capacity 92721K, committed 92844K, reserved 1128448K
class space used 11134K, capacity 13313K, committed 13376K, reserved 1048576K
2019-03-12T13:06:26.607+0100: 13051.042: [Full GC (Allocation Failure) 2019-03-12T13:06:26.607+0100: 13051.042: [CMS2019-03-12T13:06:27.119+0100: 13051.554: [CMS-concurrent-mark: 0.522/0.525 secs] [Times: user=0.56 sys=0.00, real=0.53 secs]
(concurrent mode failure): 707839K->707839K(707840K), 2.0573578 secs] 1014527K->1013692K(1014528K), [Metaspace: 85552K->85552K(1128448K)], 2.0574695 secs] [Times: user=2.06 sys=0.00, real=2.06 secs]

Any ideas what I can do to run this nice tool stable ?

Kind regards
Hans