Hello everyone,
I have an ELK cluster and encountered performance problems, which caused most data to be written 10 minutes slower than the actual time. This problem occurs occasionally.
I want to know the cause of this problem and solve it.
The following is information related to my ELK ...
ELK node1~node9, there are nine units in total, each with the same specifications, 36 core+256 GB RAM+disk SSD 8TB.
According to my observation, there are no performance issues with CPU or RAM.
elasticsearch.yml :
node.name: node7
node.master: true
node.data: true
cluster.initial_master_nodes: ["node1", "node2", "node3"]
network.host: 0.0.0.0
discovery.seed_hosts: ["10.0.0.1", "10.0.0.2", "10.0.0.3", "10.0.0.4", "10.0.0.5","10.0.0.6","10.0.0.7","10.0.0.8","10.0.0.9"]
xpack.security.enabled: true
xpack.license.self_generated.type: basic
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: /etc/elasticsearch/certs/test.p12
xpack.security.transport.ssl.truststore.path: /etc/elasticsearch/certs/test.p12
path.data: /mnt/es1, /mnt/es2, /mnt/es3, /mnt/es4, /mnt/es5, /mnt/es6, /mnt/es7
path.logs: /var/log/elasticsearch
indices.breaker.total.use_real_memory: false
indices.breaker.total.limit: 80%
indices.fielddata.cache.size: 30%
indices.breaker.fielddata.limit: 40%
indices.breaker.request.limit: 60%
thread_pool:
write:
queue_size: 4000
In node7 Performance related information :
avg-cpu: %user %nice %system %iowait %steal %idle
15.72 0.00 0.33 36.22 0.00 47.73
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz aqu-sz %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 0.00 0.00 0.00 7.00 47.50 0.00 0.00 3.00 6.79 0.00 0.00 0.00 0.00 0.00 0.00 0.02 2.80
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 0.00 0.00 0.00 0.00 0.00 95.00 2821.00 0.00 0.00 18.85 29.69 0.00 0.00 0.00 0.00 0.00 0.00 1.79 104.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 0.00 0.00 0.00 0.00 4.00 57.00 0.00 0.00 40.75 14.25 0.00 0.00 0.00 0.00 0.00 0.00 0.16 8.80
sdg 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdh 0.00 0.00 0.00 0.00 0.00 0.00 30.00 625.00 0.00 0.00 26.03 20.83 0.00 0.00 0.00 0.00 0.00 0.00 0.78 20.80
Whenever a problem occurs, the writing performance of a certain hard disk (ssd) in node7 will be particularly high.
How to confirm the cause of the problem? Is it because of hard drive damage or performance bottleneck?