Relation between shard size and sum of its segments size

Tommaso_Parisi · December 19, 2023, 5:45pm

Hello,
I was under the assumption that the disk space used by a shard is the sum of the disk space of its segments.

I have and index with 1 Million document and 3 shards and it seems to me that this is not true. Can someone tell me if this is ok or it should be further investigated? I could not find any pointer in the documentation.

This is the output of a call to

_cat/shards/xxx?v&h=shard,state,index,id,prirep,state,docs,store,merges.*
shard state   index                       id                     prirep   docs  store merges.current merges.current_docs merges.current_size merges.total merges.total_docs merges.total_size merges.total_time
0     STARTED xxx.2023-12-19.11.53 haMvMi8fToesbgjwHYFc3A r      339904  3.7gb              0                   0                  0b          905           2762242            29.1gb              1.1h
0     STARTED xxx.2023-12-19.11.53 Yqoddn-tTfK5ggHi5_eYAw p      339904  8.2gb              0                   0                  0b          921           2811980            30.1gb              1.1h
1     STARTED xxx.2023-12-19.11.53 w7DSKeH9SZa_2C72ulKGmw p      341076 15.9gb              0                   0                  0b          878           2722777            27.7gb              1.1h
1     STARTED xxx.2023-12-19.11.53 Yqoddn-tTfK5ggHi5_eYAw r      341076  5.1gb              0                   0                  0b          872           2358927            24.4gb               48m
2     STARTED xxx.2023-12-19.11.53 w7DSKeH9SZa_2C72ulKGmw r      342105 14.7gb              0                   0                  0b          910           2492439            25.9gb             59.7m
2     STARTED xxx.2023-12-19.11.53 haMvMi8fToesbgjwHYFc3A p      342105  3.7gb              0                   0                  0b          909           2753674            27.9gb                1h

as you can see the size of the primary of shard 1 is 15.6 Gb

This is an extract of a GET to
_cat/segments/superevadb?v&s=p,shard,size:desc
It seems to me that the cumulative size of all the segments is less than 4Gb

index                       shard prirep ip         segment generation docs.count docs.deleted     size size.memory committed searchable version compound
xxx.2023-12-19.11.53 1     p      10.2.13.47 _5z5          7745     312399        29049     3gb           0 true      true       9.8.0   false
xxx.2023-12-19.11.53 1     p      10.2.13.47 _6ry          8782       9225         2512 173.3mb           0 true      true       9.8.0   true
xxx.2023-12-19.11.53 1     p      10.2.13.47 _6fa          8326       8876         2853 163.6mb           0 true      true       9.8.0   true
xxx.2023-12-19.11.53 1     p      10.2.13.47 _6is          8452       5430         3629 140.2mb           0 true      true       9.8.0   true
xxx.2023-12-19.11.53 1     p      10.2.13.47 _6st          8813       1073          309  20.3mb           0 true      true       9.8.0   true
xxx.2023-12-19.11.53 1     p      10.2.13.47 _6tz          8855       1277            1  19.3mb           0 true      true       9.8.0   true
xxx.2023-12-19.11.53 1     p      10.2.13.47 _6to          8844        683            4  14.7mb           0 true      true       9.8.0   true
xxx.2023-12-19.11.53 1     p      10.2.13.47 _6t3          8823        701          104  13.2mb           0 true      true       9.8.0   true
xxx.2023-12-19.11.53 1     p      10.2.13.47 _6t7          8827        223          121   9.1mb           0 true      true       9.8.0   true
xxx.2023-12-19.11.53 1     p      10.2.13.47 _6sj          8803        759            9   8.3mb           0 true      true       9.8.0   true
xxx.2023-12-19.11.53 1     p      10.2.13.47 _6tf          8835        186            0     5mb           0 true      true       9.8.0   true
xxx.2023-12-19.11.53 1     p      10.2.13.47 _6tk          8840        226            0   4.7mb           0 true      true       9.8.0   true
xxx.2023-12-19.11.53 1     p      10.2.13.47 _6ua          8866          9            1 252.6kb           0 false     true       9.8.0   true
xxx.2023-12-19.11.53 1     p      10.2.13.47 _6u4          8860          2            0  97.3kb           0 false     true       9.8.0   true
xxx.2023-12-19.11.53 1     p      10.2.13.47 _6u3          8859          1            0  78.9kb           0 false     true       9.8.0   true
xxx.2023-12-19.11.53 1     p      10.2.13.47 _6ud          8869          1            0    73kb           0 false     true       9.8.0   true
xxx.2023-12-19.11.53 1     p      10.2.13.47 _6uc          8868          1            0  72.4kb           0 false     true       9.8.0   true
xxx.2023-12-19.11.53 1     p      10.2.13.47 _6u0          8856          1            0  70.2kb           0 true      false      9.8.0   true
xxx.2023-12-19.11.53 1     p      10.2.13.47 _6u2          8858          1            0    68kb           0 true      false      9.8.0   true
xxx.2023-12-19.11.53 1     p      10.2.13.47 _6u1          8857          1            0  65.2kb           0 true      false      9.8.0   true
xxx.2023-12-19.11.53 1     p      10.2.13.47 _6ub          8867          1            0  62.4kb           0 false     true       9.8.0   true
xxx.2023-12-19.11.53 1     p      10.2.13.47 _6tr          8847          1            0  59.1kb           0 true      false      9.8.0   true
xxx.2023-12-19.11.53 1     p      10.2.13.47 _6ts          8848          1            0  58.9kb           0 true      false      9.8.0   true
xxx.2023-12-19.11.53 1     p      10.2.13.47 _6tv          8851          1            0  48.4kb           0 true      false      9.8.0   true

For other shards instead the sizes are similar. I am looking into this because after the upgrade to elastic 8.11.1 from elastic7 I am noticing an increase in the usage of disk space. After a restart of the nodes of the cluster the usage of disk goes suddenly down and then increase gradually over the hours and I can't find what I am doing wrong

Thank you,
Tommaso

Tommaso_Parisi · December 20, 2023, 7:35am

I have observed that if I close the index using the api _close and then immediately open it again the disk utilization drops and the size of each shard is the same of the cumulative size of its segments.

Is this an expected behavior?

system · January 17, 2024, 7:35am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Disk Usage - Something isnt adding up Elasticsearch	2	213	November 14, 2022
Documents deletion in shards Elasticsearch	1	299	August 18, 2018
Shard allocation based on shard size Elasticsearch	14	938	January 18, 2021
Memory allocation to elasticsearch component for a cluster setup with n nodes Elasticsearch	4	1252	September 11, 2017
Time for shard size to decrease after forcemerge Elasticsearch	0	14	November 22, 2024

Relation between shard size and sum of its segments size

Related topics