Store.size is not equal to (primary count + replica count) * pri.store.size

Stanislav_Livotov · August 12, 2021, 10:38am

Hi all,

I have index configuration, with just 1 shard, 1 primary and 1 replica.

When I’m invoking _cat/indices I’m expecting to see that my store.size will be equal to 2 *pri.store.size, however it is not so.

health status index                               uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   some_product_index_1 H7a6LOOlTrecnMdHXhI9Yg   1   1      36986        17368        1gb        180.6mb

As you can see my store.size=1gb, and my pri.store.size is 180mb . It is almoust 5 times of primary size, and from time time it can be even 10-20 times more. Why it happens?

In segments info I see that on replica there are several suspicious segments:

"_6o" : {
  "generation" : 240,
  "num_docs" : 66276,
  "deleted_docs" : 0,
  "size_in_bytes" : 136893880, ///big 136Mb, 0 deleted producs and search false
  "memory_in_bytes" : 0,
  "committed" : true,
  "search" : false,
  "version" : "8.0.0",
  "compound" : false,
  "attributes" : {
    "Lucene50StoredFieldsFormat.mode" : "BEST_SPEED"
  }
},
"_95" : {
  "generation" : 329,
  "num_docs" : 60057,
  "deleted_docs" : 0,
  "size_in_bytes" : 110486484, /// Big 110Mb, 0 deleted producs and search false
  "memory_in_bytes" : 0,
  "committed" : true,
  "search" : false,
  "version" : "8.0.0",
  "compound" : false,
  "attributes" : {
    "Lucene50StoredFieldsFormat.mode" : "BEST_SPEED"
  }
},
"_gl" : {
  "generation" : 597,
  "num_docs" : 30823,
  "deleted_docs" : 48409,
  "size_in_bytes" : 178553089, //Big 178 MB, amount of deleted producs much higher than in num_docs and significantly higher that total amount of products in my index. 
  "memory_in_bytes" : 118598,
  "committed" : false,
  "search" : true,
  "version" : "8.0.0",
  "compound" : false,
  "attributes" : {
    "Lucene50StoredFieldsFormat.mode" : "BEST_SPEED"
  }
},

I’ve tried to execute POST some_product_index_1/_forcemerge?only_expunge_deletes=true and GET some_product_index_1/_refresh but it didn’t help. All indexes became smaller, but they in total are still 4+ times more than primary.

health status index uuid pri rep docs.count docs.deleted store.size

pri.store.size
green  open   some_product_index_1 H7a6LOOlTrecnMdHXhI9Yg   1   1      37069         3034    593.3mb        129.6mb

In segments after expunge deletes and refresh I see that 1 of those segments became search false, but is not deleted, and 2 big segments search false segments were deleted. But 1 new big segment was created.

"_gl" : {
  "generation" : 597,
  "num_docs" : 79232,
  "deleted_docs" : 0,
  "size_in_bytes" : 178553089, // It was search=true before with alot of deleted docs (48409). Why it is not deleted?
  "memory_in_bytes" : 0,
  "committed" : true,
  "search" : false,
  "version" : "8.0.0",
  "compound" : false,
  "attributes" : {
    "Lucene50StoredFieldsFormat.mode" : "BEST_SPEED"
  }
},
"_kc" : {
  "generation" : 732,
  "num_docs" : 32245,
  "deleted_docs" : 96,
  "size_in_bytes" : 103069080,
  "memory_in_bytes" : 103660,
  "committed" : false,
  "search" : true,
  "version" : "8.0.0",
  "compound" : false,
  "attributes" : {
    "Lucene50StoredFieldsFormat.mode" : "BEST_SPEED"
  }
},

Below are index settings (with analysis part excluded for shorteness)

{
  "some_product_index_1" : {
    "settings" : {
      "index" : {
        "refresh_interval" : "180s",
        "number_of_shards" : "1",
        "provided_name" : "some_product_index_1",
        "similarity" : {
          "default" : {
            "type" : "boolean"
          }
        },
        "merge" : {
          "policy" : {
            "segments_per_tier" : "4",
            "deletes_pct_allowed" : "20"
          }
        },
        "gc_deletes" : "5m",
        "creation_date" : "1628678391218",
        "analysis" : {
        },
        "number_of_replicas" : "1",
        "uuid" : "H7a6LOOlTrecnMdHXhI9Yg",
        "version" : {
          "created" : "7010199"
        }
      }
    }
  }
}

So I’m trying to understand is it correct behaviour or not? Why merge policies are not working?

DavidTurner · August 12, 2021, 11:12am

Elasticsearch will retain segments if they're either visible to searches ("search": true) or they're part of the most recent commit ("committed": true). So to force it to discard older segments you need to POST _flush and POST _refresh.

Stanislav_Livotov · August 12, 2021, 4:00pm

The question is why it is not done automatically on replica nodes according to the merge policy rules? There are no issues with primary.

DavidTurner · August 12, 2021, 4:15pm

Merging, flushing and refreshing are all (approximately) mutually independent, so the merge policy doesn't really say anything about which segments get retained.

system · September 9, 2021, 4:16pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cat api indices understanding Elasticsearch	5	2947	March 31, 2020
Store.size of an index Elasticsearch	6	30458	February 14, 2018
Is number of documents in an Index is proportional to store.size or pri.store.size? Elasticsearch	2	295	March 2, 2023
Reindexing size understanding Elasticsearch	6	415	June 24, 2019
What is my actual disc consumption by an index? Elasticsearch	3	4127	July 5, 2017

Store.size is not equal to (primary count + replica count) * pri.store.size

Related topics