Hi everyone,
We are running and ELK 7.1 deployment on IKS (IBMCloud Kubernetes Service).
Our setup is the following:
Infrastructure:
-IKS Cluster running Kubernetes 1.14.8_1536
-2x worker nodes of type Virtual Shared b2c.4x16 - 4vCPUs , 16GB RAM
-3x 20GB BlockStorage with 1000 IOPS - for the elastic-master nodes
-2x1000GB BlockStorage with 1000 IOPS - for the elastic-data nodes
ELK 7.1 Deployment with HelmCharts main resource settings:
-3x elastic-master nodes
resources:
requests:
cpu: "100m"
memory: "2Gi"
limits:
cpu: "1000m"
memory: "2Gi"
esJavaOpts: "-Xmx1500m -Xms1500m"
-2x elastic-data nodes
resources:
requests:
cpu: "100m"
memory: "2Gi"
limits:
cpu: "1000m"
memory: "2Gi"
esJavaOpts: "-Xmx1500m -Xms1500m"
-1x kibana node
resources:
requests:
cpu: "100m"
memory: "500m"
limits:
cpu: "1000m"
memory: "1Gi"
-1x logstash node
no resources specified
logstashJavaOpts: "-Xmx1g -Xms1g"
We are forwarding syslog
type logs with fluentd from IKS Cluster to logstash.
Our index size /day is aprox 15GB-20GB, as you can see we have 1 primary and 1 replica for every index:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open logstash-logs-2019.11.09 zqaTt02OTHCNj01TxyJXxA 1 1 30352029 0 30.5gb 15.2gb
green open logstash-logs-2019.11.08 lhLtQrvQTJ-eaGDH6T-Ogg 1 1 19440417 0 19.4gb 9.7gb
green open logstash-logs-2019.11.07 m5jGrS4kQ5-6H_DySNbuXw 1 1 29759146 0 29.3gb 14.6gb
green open logstash-logs-2019.11.06 Se608LsyQbKWt7llaxXlWQ 1 1 21048425 0 21.3gb 10.6gb
green open logstash-logs-2019.11.05 LzqSTVxETnCGfUdcySuoIQ 1 1 26868532 0 26.7gb 13.3gb
We are forwarding logs from our APP that rests on the 3app_nodes
to our ELK that is on the 2elk_nodes
. The direct allocation to specific nodes is done with nodeSeletor
option at deployment level.
Currently in our oppinion this is not working so well. For example:
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
app_node1 1469m 37% 10077Mi 75%
app_node2 3317m 84% 10636Mi 79%
app_node3 1464m 37% 8937Mi 67%
elk_node1 1632m 41% 11317Mi 84%
elk_node2 1443m 36% 10703Mi 80%
At the above load, when I access kibana and set it for the last 1 hour and do a simple search for a keyword it displays ! Discover: Request Timeout after 30000ms
error message on the dashboard and it is practically unusable, it is very slow,frozen, you cannot do much with it....
Tried to look at the jvm heap size settings/vaues
/_cat/nodes?v&h=id,disk.total,disk.used,disk.avail,disk.used_percent,ram.current,ram.percent,ram.max,cpu
172.30.205.51 8 72 22 1.88 1.88 2.06 m - elasticsearch-master-2
172.30.19.79 6 93 26 3.25 2.60 2.52 m - elasticsearch-master-0
172.30.74.142 78 97 32 4.86 5.42 5.74 d - elasticsearch-data-1
172.30.19.80 75 93 26 3.25 2.60 2.52 d - elasticsearch-data-0
172.30.74.141 6 97 33 4.86 5.42 5.74 m * elasticsearch-master-1
_cat/nodes?h=heap.max
1.4gb
1.4gb
1.4gb
1.4gb
1.4gb
Please let us know if our foundation blocks for the setup of this ELK are properly set. I mean are these resources enough for our needs? Are the settings in the deployment correct for the resource allocations?What should we change at this level so we can move to the next level -> tuning of the ELK deployment? (but this, only after the sizing is done accordingly, I guess....)
Looking forward to your answer.
Thank you,
Zoltan