Problems with indexes.fielddata.cache.size and high I/O on HDD

KaZuKi_Yashiro · November 10, 2021, 4:44pm

Good evening.
ES version 6.8.8 in docker "docker.elastic.co/elasticsearch/elasticsearch:6.8.8"
heap_size: 31g
CPU Model on node 1: Intel(R) Xeon(R) CPU E3-1270 v5 @ 3.60GHz
CPU Model on node 2: Intel(R) Xeon(R) CPU E3-1270 v5 @ 3.60GHz
CPU Model on node 3: Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz

LA node1 and node2 - 1.
LA node 3 - 6.

We have a cluster consisting of 3 nodes with ssd disks, the contents of Elasticsearch.yml :

cluster.name: cluster-name
cluster.routing.allocation.disk.watermark.flood_stage: 96%
cluster.routing.allocation.disk.watermark.high: 95%
cluster.routing.allocation.disk.watermark.low: 94%
cluster.routing.allocation.node_concurrent_recoveries: 5
cluster.routing.allocation.node_initial_primaries_recoveries: 1
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.unicast.hosts:
- node1
- node2
- node3
indices.fielddata.cache.size: 2g
indices.memory.index_buffer_size: 30%
indices.queries.cache.size: 1500m
indices.recovery.max_bytes_per_sec: 40mb
network.host: # different ip on each nodes (i.e 192.168.0.57, 192.168.0.93 etc) 
node.name: node3
reindex.remote.whitelist:
- example1.com:9200
- example2.com:9200
- example3.com:9200
script.allowed_types: inline,stored
xpack.security.enabled: true
xpack.security.http.ssl.enabled: false
xpack.security.transport.ssl.certificate: <<cert/instance.crt>>
xpack.security.transport.ssl.certificate_authorities: <<cert/ca.crt>>
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.key: <<cert/crt.key>>
xpack.security.transport.ssl.key_passphrase: <<pass>>
xpack.security.transport.ssl.verification_mode: certificate

GET _cat/indices/*?v&s=index

health status index                 uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .kibana_1             6rr2MgPxQPeLa8BM-6sQzA   1   1        103            0      701kb        350.5kb
green  open   .kibana_2             h1elhP9uSv21VYVr1pMQDQ   1   1        433           16      2.6mb          1.3mb
green  open   .kibana_task_manager  iSgnMYnRRI2OBn0ygS1ypA   1   1          2            0     25.9kb         12.9kb
green  open   .reporting-2021.10.31 GiUwDe9-QHOnh0UJN_qvgQ   1   1          4            0      1.2mb        664.2kb
green  open   .reporting-2021.11.07 OcTOLONbQDuWYcC8FyWyXA   1   1         33            2     42.1mb           21mb
green  open   .security-6           YvPLEm1NQauIlGcqEl7hfA   1   1         35            3     85.5kb         42.7kb
green  open   .tasks                2uqrVWd1RXynMlkYqQbW6A   1   1          1            0     12.4kb          6.2kb
green  open   index-name             GWYxb-vQTJK_wOndK5vGEQ  12   1  109647228     37229985    359.8gb        179.1gb

GET _cat/shards?v&s=index

index                 shard prirep state      docs   store ip             node
.kibana_1             0     r      STARTED     103 350.5kb 192.168.0.93 node2
.kibana_1             0     p      STARTED     103 350.5kb 192.168.0.92 node1
.kibana_2             0     p      STARTED     433   1.3mb 192.168.0.93 node2
.kibana_2             0     r      STARTED     433   1.3mb 192.168.0.92 node1
.kibana_task_manager  0     p      STARTED       2  12.9kb 192.168.0.92 node1
.kibana_task_manager  0     r      STARTED       2  12.9kb 192.168.0.57 node3
.reporting-2021.10.31 0     p      STARTED       4 664.2kb 192.168.0.93 node2
.reporting-2021.10.31 0     r      STARTED       4 664.2kb 192.168.0.57 node3
.reporting-2021.11.07 0     p      STARTED      33    21mb 192.168.0.93 node2
.reporting-2021.11.07 0     r      STARTED      33    21mb 192.168.0.92 node1
.security-6           0     r      STARTED      35  42.7kb 192.168.0.92 node1
.security-6           0     p      STARTED      35  42.7kb 192.168.0.57 node3
.tasks                0     r      STARTED       1   6.2kb 192.168.0.93 node2
.tasks                0     p      STARTED       1   6.2kb 192.168.0.57 node3
index-name             9     r      STARTED 9131426  14.2gb 192.168.0.93 node2
index-name             9     p      STARTED 9131426  13.9gb 192.168.0.92 node1
index-name             1     r      STARTED 9130734    16gb 192.168.0.93 node2
index-name             1     p      STARTED 9130733  15.7gb 192.168.0.57 node3
index-name             2     r      STARTED 9142796  14.1gb 192.168.0.93 node2
index-name             2     p      STARTED 9142796  13.6gb 192.168.0.57 node3
index-name             5     p      STARTED 9139233  15.5gb 192.168.0.92 node1
index-name             5     r      STARTED 9139230  14.6gb 192.168.0.57 node3
index-name             11    r      STARTED 9140537  16.5gb 192.168.0.93 node2
index-name             11    p      STARTED 9140536  15.6gb 192.168.0.92 node1
index-name             7     p      STARTED 9135450  15.9gb 192.168.0.92 node1
index-name             7     r      STARTED 9135450  15.4gb 192.168.0.57 node3
index-name             3     p      STARTED 9133941  14.5gb 192.168.0.93 node2
index-name             3     r      STARTED 9133943  14.5gb 192.168.0.57 node3
index-name             10    r      STARTED 9137125  16.1gb 192.168.0.93 node2
index-name             10    p      STARTED 9137125  16.5gb 192.168.0.92 node1
index-name             4     r      STARTED 9135834  13.5gb 192.168.0.92 node1
index-name             4     p      STARTED 9135830  12.4gb 192.168.0.57 node3
index-name             8     r      STARTED 9138516  15.1gb 192.168.0.93 node2
index-name             8     p      STARTED 9138516  14.4gb 192.168.0.92 node1
index-name             6     r      STARTED 9142530  15.1gb 192.168.0.92 node1
index-name             6     p      STARTED 9142530  15.4gb 192.168.0.57 node3
index-name             0     p      STARTED 9139134  15.2gb 192.168.0.93 node2
index-name             0     r      STARTED 9139132  14.9gb 192.168.0.57 node3

I attach several screenshots showing the difference in field cache size on different nodes.

Node 1.
node1_fieldcache

Node 3.
node3_fieldcache

Question 1 - Why are the values not identical and how does this affect the operation of the cluster?

Also on the 3rd node, the disk load is 80%, and sometimes 99%

I/O node1:

I/O node2:

I/O node3:

Could it be related to indexes.fielddata.cache.size? Or with what?

So, question 2 - Could it be high I/O because of the low value of indexes.fielddata.cache.size on node3?
Or it's because of the weak CPU on node 3?

system · December 8, 2021, 4:44pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Data node high CPU Elasticsearch	19	3657	February 26, 2018
Suggest Elasticsearch Cluster Configuration Elasticsearch docker	3	269	November 22, 2021
Elasticsearch occasionally slow query problem Elasticsearch docker	2	874	October 23, 2019
Is there any problem that set ES heap size to 64G? Elasticsearch	2	234	March 27, 2023
ES 7.10 docker performance issues Elasticsearch docker	1	404	December 21, 2021

Problems with indexes.fielddata.cache.size and high I/O on HDD

Related topics