Elasticsearch Cluster Disk Write Performance

sheng855174 · January 10, 2024, 8:25am

Hello everyone,

I have an ELK cluster and encountered performance problems, which caused most data to be written 10 minutes slower than the actual time. This problem occurs occasionally.

I want to know the cause of this problem and solve it.

The following is information related to my ELK ...

ELK node1~node9, there are nine units in total, each with the same specifications, 36 core+256 GB RAM+disk SSD 8TB.

According to my observation, there are no performance issues with CPU or RAM.

elasticsearch.yml :

node.name: node7
node.master: true
node.data: true
cluster.initial_master_nodes: ["node1", "node2", "node3"]
network.host: 0.0.0.0
discovery.seed_hosts: ["10.0.0.1", "10.0.0.2", "10.0.0.3", "10.0.0.4", "10.0.0.5","10.0.0.6","10.0.0.7","10.0.0.8","10.0.0.9"]
xpack.security.enabled: true
xpack.license.self_generated.type: basic
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: /etc/elasticsearch/certs/test.p12
xpack.security.transport.ssl.truststore.path: /etc/elasticsearch/certs/test.p12
path.data: /mnt/es1, /mnt/es2, /mnt/es3, /mnt/es4, /mnt/es5, /mnt/es6, /mnt/es7
path.logs: /var/log/elasticsearch
indices.breaker.total.use_real_memory: false
indices.breaker.total.limit: 80%
indices.fielddata.cache.size: 30%
indices.breaker.fielddata.limit: 40%
indices.breaker.request.limit: 60%
thread_pool:
  write:
    queue_size: 4000

In node7 Performance related information :

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          15.72    0.00    0.33   36.22    0.00   47.73

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
sda              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sdb              0.00      0.00     0.00   0.00    0.00     0.00    7.00     47.50     0.00   0.00    3.00     6.79    0.00      0.00     0.00   0.00    0.00     0.00    0.02   2.80
sdc              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sdd              0.00      0.00     0.00   0.00    0.00     0.00   95.00   2821.00     0.00   0.00   18.85    29.69    0.00      0.00     0.00   0.00    0.00     0.00    1.79 104.00
sde              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sdf              0.00      0.00     0.00   0.00    0.00     0.00    4.00     57.00     0.00   0.00   40.75    14.25    0.00      0.00     0.00   0.00    0.00     0.00    0.16   8.80
sdg              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sdh              0.00      0.00     0.00   0.00    0.00     0.00   30.00    625.00     0.00   0.00   26.03    20.83    0.00      0.00     0.00   0.00    0.00     0.00    0.78  20.80

Whenever a problem occurs, the writing performance of a certain hard disk (ssd) in node7 will be particularly high.

How to confirm the cause of the problem? Is it because of hard drive damage or performance bottleneck?

yago82 · January 10, 2024, 9:22am

sheng855174:

I have an ELK cluster and encountered performance problems, which caused most data to be written 10 minutes slower than the actual time. This problem occurs occasionally.

I want to know the cause of this problem and solve it.

The following is information related to my ELK ...

ELK node1~node9, there are nine units in total, each with the same specifications, 36 core+256 GB RAM+disk SSD 8TB.

According to my observation, there are no performance issues with CPU or RAM.

elasticsearch.yml :

node.name: node7
node.master: true
node.data: true
cluster.initial_master_nodes: ["node1", "node2", "node3"]
network.host: 0.0.0.0
discovery.seed_hosts: ["10.0.0.1", "10.0.0.2", "10.0.0.3", "10.0.0.4", "10.0.0.5","10.0.0.6","10.0.0.7","10.0.0.8","10.0.0.9"]
xpack.security.enabled: true
xpack.license.self_generated.type: basic
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: /etc/elasticsearch/certs/test.p12
xpack.security.transport.ssl.truststore.path: /etc/elasticsearch/certs/test.p12
path.data: /mnt/es1, /mnt/es2, /mnt/es3, /mnt/es4, /mnt/es5, /mnt/es6, /mnt/es7
path.logs: /var/log/elasticsearch
indices.breaker.total.use_real_memory: false
indices.breaker.total.limit: 80%
indices.fielddata.cache.size: 30%
indices.breaker.fielddata.limit: 40%
indices.breaker.request.limit: 60%
thread_pool:
  write:
    queue_size: 4000

In node7 Performance related information :

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          15.72    0.00    0.33   36.22    0.00   47.73

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
sda              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sdb              0.00      0.00     0.00   0.00    0.00     0.00    7.00     47.50     0.00   0.00    3.00     6.79    0.00      0.00     0.00   0.00    0.00     0.00    0.02   2.80
sdc              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sdd              0.00      0.00     0.00   0.00    0.00     0.00   95.00   2821.00     0.00   0.00   18.85    29.69    0.00      0.00     0.00   0.00    0.00     0.00    1.79 104.00
sde              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sdf              0.00      0.00     0.00   0.00    0.00     0.00    4.00     57.00     0.00   0.00   40.75    14.25    0.00      0.00     0.00   0.00    0.00     0.00    0.16   8.80
sdg              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sdh              0.00      0.00     0.00   0.00    0.00     0.00   30.00    625.00     0.00   0.00   26.03    20.83    0.00      0.00     0.00   0.00    0.00     0.00    0.78  20.80

Whenever a problem occurs, the writing performance of a certain hard disk (ssd) in node7 will be particularly high.

How to confirm the cause of the problem? Is it because of hard drive damage or performance bottleneck?

Hi,

You can check the Elasticsearch slow logs: Slow logs can help you identify any slow queries or indexing operations that might be causing high disk I/O. You can enable slow logs by updating your index settings

Regards

system · February 7, 2024, 9:22am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
My ELK Cluster is slow indexing Why? Please Help Elasticsearch	6	605	February 8, 2021
Write slow on elastic cluster Elasticsearch	12	862	July 21, 2021
ELK performance Elasticsearch	6	1582	April 19, 2018
Elasticsearch - Disk writes increasing over time Elasticsearch	3	633	July 5, 2017
Elastic cluster slow down afre a few weeks of uptime(cluster recommendations) Elasticsearch	17	865	January 17, 2020

Elasticsearch Cluster Disk Write Performance

Related topics