Hey there,
I'm about 4 days new to ELK; very green to all this. Elasticsearch, kibana and logstash are all version 6.2.4, all running on one Debian 9.4 VM. I guess that means my cluster contains only a single node.
I've been experimenting with Elastiflow to use this stack for netflow data collection and analysis. 24 hours in, and I've collected over 57GB of flow data across over 65 million flow records (documents?)
root@docker:/home/jlixfeld# curl 'localhost:9200/_cat/indices?v'
health status index                 uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .kibana               GK1TSIs6SSen6HWo40SRTQ   1   0        218            4    141.9kb        141.9kb
yellow open   elastiflow-2018.04.22 NQkSZ44NQuKt5bw46KT1aw   2   1   40073930            0       20gb           20gb
yellow open   elastiflow-2018.04.21 L74S_otgTouE3lFnPW36Gg   2   1   29043943            0     14.5gb         14.5gb
root@docker:/home/jlixfeld#
Elastiflow has all sorts of dashboards, and I can pretty reliably look at a 4 hour time range in any of the dashboards. Trying to select a 12 hour time range or above, however, results in Kibana timeouts for Timeline and Visualize.
The VM has 4 vCPUs (2 x E5-2630 v4 @ 2.20GHz), and 64GB of memory allocated to it, and /var/lib/elasticsearch is on SSDs.
Load average seems kind of high when I try to run these dashboards, despite all the memory and the SSDs that I've thrown at this.
top - 13:39:33 up  1:02,  2 users,  load average: 7.86, 4.75, 4.49
Tasks: 169 total,   1 running, 168 sleeping,   0 stopped,   0 zombie
%Cpu(s): 64.8 us,  0.2 sy, 34.8 ni,  0.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 65982164 total, 18950704 free, 36122940 used, 10908520 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 29090220 avail Mem
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  667 elastic+  20   0 70.166g 0.034t 2.117g S 251.8 55.9  61:06.38 java
  618 logstash  39  19 5065348 1.339g 330020 S 146.8  2.1 115:01.27 java
  637 kibana    20   0 1263816  97728  21140 S   0.3  0.1   0:16.71 node
I've read some things on tuning ES, and as near as I can tell, Debian/systemd already has most of the performance tuning defaults set. For the rest, here's what I've got:
#/etc/default/elasticsearch: ES_JAVA_OPTS="-Xms32766m -Xmx32766m -Des.enforce.bootstrap.checks=true"
#/usr/lib/systemd/system/elasticsearch.service: LimitNOFILE=65536
#/etc/systemd/system/elasticsearch.service.d/override.conf: LimitMEMLOCK=infinity
#/etc/fstab: #/dev/mapper/docker--vg-swap_1 none            swap    sw              0       0
I've done no other tuning or configuration anywhere.
I'm not really too sure where to go from here to try and get it to perform better. The end game for this right now is to collect as much data as I can until I run out of space. I have more spinny disk capacity than I do SSD, so if I can get away with putting this on spinny disks, I could collect 1.5TB. If I/O turns out to be contributing to the bottleneck, I can get away with holding onto about 750GB. That said, I have experimented with both spinny disks and SSDs, and I still get the Kibana timeouts regardless.
This data is mostly unimportant (no need to be replicated somewhere else) and it won't be looked at very often. Maybe monthly. But it's important to be able to load dashboards reliably (and relatively quickly) when required.
Any insights?



