Hey there,
I'm about 4 days new to ELK; very green to all this. Elasticsearch, kibana and logstash are all version 6.2.4, all running on one Debian 9.4 VM. I guess that means my cluster contains only a single node.
I've been experimenting with Elastiflow to use this stack for netflow data collection and analysis. 24 hours in, and I've collected over 57GB of flow data across over 65 million flow records (documents?)
root@docker:/home/jlixfeld# curl 'localhost:9200/_cat/indices?v'
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open .kibana GK1TSIs6SSen6HWo40SRTQ 1 0 218 4 141.9kb 141.9kb
yellow open elastiflow-2018.04.22 NQkSZ44NQuKt5bw46KT1aw 2 1 40073930 0 20gb 20gb
yellow open elastiflow-2018.04.21 L74S_otgTouE3lFnPW36Gg 2 1 29043943 0 14.5gb 14.5gb
root@docker:/home/jlixfeld#
Elastiflow has all sorts of dashboards, and I can pretty reliably look at a 4 hour time range in any of the dashboards. Trying to select a 12 hour time range or above, however, results in Kibana timeouts for Timeline and Visualize.
The VM has 4 vCPUs (2 x E5-2630 v4 @ 2.20GHz), and 64GB of memory allocated to it, and /var/lib/elasticsearch is on SSDs.
Load average seems kind of high when I try to run these dashboards, despite all the memory and the SSDs that I've thrown at this.
top - 13:39:33 up 1:02, 2 users, load average: 7.86, 4.75, 4.49
Tasks: 169 total, 1 running, 168 sleeping, 0 stopped, 0 zombie
%Cpu(s): 64.8 us, 0.2 sy, 34.8 ni, 0.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 65982164 total, 18950704 free, 36122940 used, 10908520 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 29090220 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
667 elastic+ 20 0 70.166g 0.034t 2.117g S 251.8 55.9 61:06.38 java
618 logstash 39 19 5065348 1.339g 330020 S 146.8 2.1 115:01.27 java
637 kibana 20 0 1263816 97728 21140 S 0.3 0.1 0:16.71 node
I've read some things on tuning ES, and as near as I can tell, Debian/systemd already has most of the performance tuning defaults set. For the rest, here's what I've got:
#/etc/default/elasticsearch: ES_JAVA_OPTS="-Xms32766m -Xmx32766m -Des.enforce.bootstrap.checks=true"
#/usr/lib/systemd/system/elasticsearch.service: LimitNOFILE=65536
#/etc/systemd/system/elasticsearch.service.d/override.conf: LimitMEMLOCK=infinity
#/etc/fstab: #/dev/mapper/docker--vg-swap_1 none swap sw 0 0
I've done no other tuning or configuration anywhere.
I'm not really too sure where to go from here to try and get it to perform better. The end game for this right now is to collect as much data as I can until I run out of space. I have more spinny disk capacity than I do SSD, so if I can get away with putting this on spinny disks, I could collect 1.5TB. If I/O turns out to be contributing to the bottleneck, I can get away with holding onto about 750GB. That said, I have experimented with both spinny disks and SSDs, and I still get the Kibana timeouts regardless.
This data is mostly unimportant (no need to be replicated somewhere else) and it won't be looked at very often. Maybe monthly. But it's important to be able to load dashboards reliably (and relatively quickly) when required.
Any insights?