Elasticsearch Cluster Disk Write Performance

Hello everyone,

I have an ELK cluster and encountered performance problems, which caused most data to be written 10 minutes slower than the actual time. This problem occurs occasionally.

I want to know the cause of this problem and solve it.

The following is information related to my ELK ...

ELK node1~node9, there are nine units in total, each with the same specifications, 36 core+256 GB RAM+disk SSD 8TB.

According to my observation, there are no performance issues with CPU or RAM.

elasticsearch.yml :

node.name: node7
node.master: true
node.data: true
cluster.initial_master_nodes: ["node1", "node2", "node3"]
network.host: 0.0.0.0
discovery.seed_hosts: ["10.0.0.1", "10.0.0.2", "10.0.0.3", "10.0.0.4", "10.0.0.5","10.0.0.6","10.0.0.7","10.0.0.8","10.0.0.9"]
xpack.security.enabled: true
xpack.license.self_generated.type: basic
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: /etc/elasticsearch/certs/test.p12
xpack.security.transport.ssl.truststore.path: /etc/elasticsearch/certs/test.p12
path.data: /mnt/es1, /mnt/es2, /mnt/es3, /mnt/es4, /mnt/es5, /mnt/es6, /mnt/es7
path.logs: /var/log/elasticsearch
indices.breaker.total.use_real_memory: false
indices.breaker.total.limit: 80%
indices.fielddata.cache.size: 30%
indices.breaker.fielddata.limit: 40%
indices.breaker.request.limit: 60%
thread_pool:
  write:
    queue_size: 4000


In node7 Performance related information :

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          15.72    0.00    0.33   36.22    0.00   47.73

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
sda              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sdb              0.00      0.00     0.00   0.00    0.00     0.00    7.00     47.50     0.00   0.00    3.00     6.79    0.00      0.00     0.00   0.00    0.00     0.00    0.02   2.80
sdc              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sdd              0.00      0.00     0.00   0.00    0.00     0.00   95.00   2821.00     0.00   0.00   18.85    29.69    0.00      0.00     0.00   0.00    0.00     0.00    1.79 104.00
sde              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sdf              0.00      0.00     0.00   0.00    0.00     0.00    4.00     57.00     0.00   0.00   40.75    14.25    0.00      0.00     0.00   0.00    0.00     0.00    0.16   8.80
sdg              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sdh              0.00      0.00     0.00   0.00    0.00     0.00   30.00    625.00     0.00   0.00   26.03    20.83    0.00      0.00     0.00   0.00    0.00     0.00    0.78  20.80

Whenever a problem occurs, the writing performance of a certain hard disk (ssd) in node7 will be particularly high.

How to confirm the cause of the problem? Is it because of hard drive damage or performance bottleneck?

Hi,

You can check the Elasticsearch slow logs: Slow logs can help you identify any slow queries or indexing operations that might be causing high disk I/O. You can enable slow logs by updating your index settings

Regards

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.