Understanding Elasticsearch scaling

Hi there,

I have an ES instance that chokes out on CPU when it is doing searches that span a long timeframe (> 12 hours):

top - 12:36:34 up 6 days,  5:00,  1 user,  load average: 2.41, 1.73, 0.89
Tasks: 166 total,   1 running, 165 sleeping,   0 stopped,   0 zombie
%Cpu0  : 99.0 us,  0.7 sy,  0.0 ni,  0.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  : 98.0 us,  1.3 sy,  0.0 ni,  0.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  : 99.3 us,  0.0 sy,  0.0 ni,  0.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  : 99.0 us,  0.3 sy,  0.0 ni,  0.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 65982164 total,  9197488 free, 18018484 used, 38766192 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 47095628 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
23514 elastic+  20   0 77.844g 0.019t 2.772g S 395.7 31.1   3363:49 java
  573 root      20   0       0      0      0 S   0.3  0.0  13:24.44 jbd2/sdc1-8
25103 kibana    20   0 1280316 118080  21356 S   0.3  0.2  44:11.49 node
    1 root      20   0   57088   6776   5244 S   0.0  0.0   0:03.06 systemd
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.08 kthreadd
    3 root      20   0       0      0      0 S   0.0  0.0   0:24.15 ksoftirqd/0

This is on an ESXi host, where I'm limited to 4 vCPUs per VM, so I think I need to horizontally scale the ES portion of my stack.

I'm fuzzy on how to do that based on my goals. Here's what I'm after:

  • ES is able to return results to these long queries. I get that the long queries may take some time to run, and that's fine. I don't necessarily need the results to be produced instantaneously.
  • I don't care about having one ES instance be able to take over for the other in the event of a failure, because I believe that in doing so, data must be duplicated across both ES instances, and I don't want to consume 30GB/day * 2

I currently have one logstash instance feeding one elasticsearch instance (each instance is on it's own VM). My Kibana instance is on the same VM as the elasticsearch instance.

I'm not sure if there's a doc that describes the different ways ELK can be scaled depending on what I'm after. If so, I'd love to be pointed to the relevant links. If not, hopefully someone here is willing to provide some insight/guidance.

FWIW, the stack is v6.2.4.

Thanks!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.