Store.size is not equal to (primary count + replica count) * pri.store.size

Hi all,

I have index configuration, with just 1 shard, 1 primary and 1 replica.

When I’m invoking _cat/indices I’m expecting to see that my store.size will be equal to 2 *pri.store.size, however it is not so.

health status index                               uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   some_product_index_1 H7a6LOOlTrecnMdHXhI9Yg   1   1      36986        17368        1gb        180.6mb

As you can see my store.size=1gb, and my pri.store.size is 180mb . It is almoust 5 times of primary size, and from time time it can be even 10-20 times more. Why it happens?

In segments info I see that on replica there are several suspicious segments:

"_6o" : {
  "generation" : 240,
  "num_docs" : 66276,
  "deleted_docs" : 0,
  "size_in_bytes" : 136893880, ///big 136Mb, 0 deleted producs and search false
  "memory_in_bytes" : 0,
  "committed" : true,
  "search" : false,
  "version" : "8.0.0",
  "compound" : false,
  "attributes" : {
    "Lucene50StoredFieldsFormat.mode" : "BEST_SPEED"
  }
},
"_95" : {
  "generation" : 329,
  "num_docs" : 60057,
  "deleted_docs" : 0,
  "size_in_bytes" : 110486484, /// Big 110Mb, 0 deleted producs and search false
  "memory_in_bytes" : 0,
  "committed" : true,
  "search" : false,
  "version" : "8.0.0",
  "compound" : false,
  "attributes" : {
    "Lucene50StoredFieldsFormat.mode" : "BEST_SPEED"
  }
},
"_gl" : {
  "generation" : 597,
  "num_docs" : 30823,
  "deleted_docs" : 48409,
  "size_in_bytes" : 178553089, //Big 178 MB, amount of deleted producs much higher than in num_docs and significantly higher that total amount of products in my index. 
  "memory_in_bytes" : 118598,
  "committed" : false,
  "search" : true,
  "version" : "8.0.0",
  "compound" : false,
  "attributes" : {
    "Lucene50StoredFieldsFormat.mode" : "BEST_SPEED"
  }
},

I’ve tried to execute POST some_product_index_1/_forcemerge?only_expunge_deletes=true and GET some_product_index_1/_refresh but it didn’t help. All indexes became smaller, but they in total are still 4+ times more than primary.

health status index uuid pri rep docs.count docs.deleted store.size

pri.store.size
green  open   some_product_index_1 H7a6LOOlTrecnMdHXhI9Yg   1   1      37069         3034    593.3mb        129.6mb

In segments after expunge deletes and refresh I see that 1 of those segments became search false, but is not deleted, and 2 big segments search false segments were deleted. But 1 new big segment was created.

"_gl" : {
  "generation" : 597,
  "num_docs" : 79232,
  "deleted_docs" : 0,
  "size_in_bytes" : 178553089, // It was search=true before with alot of deleted docs (48409). Why it is not deleted?
  "memory_in_bytes" : 0,
  "committed" : true,
  "search" : false,
  "version" : "8.0.0",
  "compound" : false,
  "attributes" : {
    "Lucene50StoredFieldsFormat.mode" : "BEST_SPEED"
  }
},
"_kc" : {
  "generation" : 732,
  "num_docs" : 32245,
  "deleted_docs" : 96,
  "size_in_bytes" : 103069080,
  "memory_in_bytes" : 103660,
  "committed" : false,
  "search" : true,
  "version" : "8.0.0",
  "compound" : false,
  "attributes" : {
    "Lucene50StoredFieldsFormat.mode" : "BEST_SPEED"
  }
},

Below are index settings (with analysis part excluded for shorteness)

{
  "some_product_index_1" : {
    "settings" : {
      "index" : {
        "refresh_interval" : "180s",
        "number_of_shards" : "1",
        "provided_name" : "some_product_index_1",
        "similarity" : {
          "default" : {
            "type" : "boolean"
          }
        },
        "merge" : {
          "policy" : {
            "segments_per_tier" : "4",
            "deletes_pct_allowed" : "20"
          }
        },
        "gc_deletes" : "5m",
        "creation_date" : "1628678391218",
        "analysis" : {
        },
        "number_of_replicas" : "1",
        "uuid" : "H7a6LOOlTrecnMdHXhI9Yg",
        "version" : {
          "created" : "7010199"
        }
      }
    }
  }
}

So I’m trying to understand is it correct behaviour or not? Why merge policies are not working?

Elasticsearch will retain segments if they're either visible to searches ("search": true) or they're part of the most recent commit ("committed": true). So to force it to discard older segments you need to POST _flush and POST _refresh.

The question is why it is not done automatically on replica nodes according to the merge policy rules? There are no issues with primary.

Merging, flushing and refreshing are all (approximately) mutually independent, so the merge policy doesn't really say anything about which segments get retained.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.