Performance issue on single node instance


my kibana is quite slow. Coming from kibana 3, it seems to be faster than the new one.
We have a single node instance, 9 CPU 24GB RAM. Running Logstash, ELK and Kibana on the same machine.
sporadically but quite often we get timeouts in kibana. Retrying the query mostly returns result, I think because of ES cache.

top shows:

top - 11:04:48 up 16 days, 23:19,  1 user,  load average: 0.60, 1.63, 2.97
Tasks: 172 total,   2 running, 170 sleeping,   0 stopped,   0 zombie
%Cpu(s):  6.4 us,  0.6 sy,  0.5 ni, 92.4 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 28780884 total,   171788 free, 18373960 used, 10235136 buff/cache
KiB Swap: 10484732 total, 10031116 free,   453616 used.  9805796 avail Mem

 1711 elastic+  20   0  0.278t 0.016t 437456 S  54.5 59.0   3484:13 java
18601 logstash  39  19 7149564 1.036g   7684 S   3.7  3.8 349:00.59 java
17749 kibana    20   0 1337280 130332   5900 S   1.0  0.5  85:26.60 node
    1 root      20   0  194732   4684   2792 S   0.0  0.0   8:43.57 systemd

In ES I found following in the log:

[2017-03-17T10:51:51,182][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][685485] overhead, spent [366ms] collecting in the last [1.1s]
[2017-03-17T10:51:58,263][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][685492] overhead, spent [393ms] collecting in the last [1s]
[2017-03-17T10:52:45,892][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][685538] overhead, spent [580ms] collecting in the last [1.1s]
[2017-03-17T10:52:47,009][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][685539] overhead, spent [330ms] collecting in the last [1.1s]
[2017-03-17T10:52:48,009][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][685540] overhead, spent [418ms] collecting in the last [1s]
[2017-03-17T10:52:58,011][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][685550] overhead, spent [271ms] collecting in the last [1s]

Also interesting, Why are some log lines INFO and some WARN?
Do I need to increase RAM and / or heap?

Heap settings:
ES: 15G
Logstash: 1GB

Did I badly configure the heap for ES or do we just need more RAM and bigger heap?



the difference between warn/info is due to the amount of time spent doing GC. More than half the time seems to issue a WARN message (just guessing here, have not looked it up).

With running all parts of the stack on a single system, it makes it really hard to debug any part of the system. Which process is stealing resources from the other? If a query is CPU bound and you have a huge logstash processing going on, both processes will compete for resources.

What does 'retrying the query mostly returns` - what happens when it's not the mostly case? Less data returned? Timing out? If you say everything was better before, does this also mean you are comparing the exact same setup otherwise (number of indices, number of documents, query complexity, etc).

Without more information and isolation this is very tough to answer and mostly guesswork. Also, you have not even provided the elasticsearch/logstash/kibana version being used.


This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.