Warm indices is not compressed

I read on elastic documentation that warm phase could reduce index size,
but it didn't state clearly how much index will be compressed on warm phase.
It is stated that warm phase also have shrink and force merge feature, should I use this feature to get my indices "compressed"? because I already have my indices on my hot phase with 1 primary shard, so I don't think it is effective to shrink indices on rollover,

this is my old index,


and this is my new index (already on warm phase)

Anyone can explain about this one? because I can't clearly see any difference on size wise between my old index and my "warm" index.

Thanks in advance

Because that's very much an It Depends answer.

Can you share your policy please.

Did you force Merge your Index? ... that is what shrinks it... otherwise Hot / Warm are going to basically be the same size-ish...

How much that reduces you will need to test.

You can do it manually

or in ILM under advanced settings

hi @warkolm , thanks for the answer
here's my policy,

PUT _ilm/policy/confluent-threemonths
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_size": "200gb",
            "max_age": "30d"
          },
          "set_priority": {
            "priority": 100
          }
        }
      },
      "warm": {
        "actions": {
          "set_priority": {
            "priority": 50
          },
          "migrate": {
            "enabled": false
          }
        }
      },
      "delete": {
        "min_age": "60d",
        "actions": {
          "delete": {
            "delete_searchable_snapshot": true
          }
        }
      }
    }
  }
}

so I can conclude that my index won't be compressed if I don't force merge it on warm phase right?
for the shrink one, I don't think I can shrink my index because it already has 1 shard on each index

@alfianaf

You are confusing shards and segments.

A shard is made up of 1 to Many segments, which are the units of lucene data on disk.

Force merge compresses many segments into 1 segment which is most efficient not only with respect to size but with respect to querying the data.

That is why you want to force merge... A shard made up of one segment is most efficient

I guess I can solve my problem then, I have to use force merge on each of my index next time

thanks for the detailed explanation

1 Like

You can run it on current indexes right now... Depending on how big they are it may take a while

does it really doing heavy workload on the cluster side? because I'm afraid the cluster performance would go down on force merging process

Depends on how many indexes and your indexes are only 3 GB so you're not going to get a super ton of compression but it could add up.

Force merge runs in the background really only runs on one index as a time.

But this is your production system you would need to make that decision.

You can run it on a single index if you want to test it.

sorry for further question, I don't know if I should make a new thread or put the question here.
I just tried the force merge on the 3GB index, and the size increased to around 8GB on the process.
and from the documentation I've read


Force merge should only be called against an index after you have finished writing to it. Force merge can cause very large (>5GB) segments to be produced

does it mean that if I try to force merge an index about 500GB, the segment to be produced could be up to 100GB++?

Did you run it on a index that was no longer being written too?

How do you know the force merge is done?

Did you run _cat/segments on the index?

Do you have 500GB indexes if so how many shards?

Segments are related to shards

So back how this should really work put force merge as part of your ILM policy that gets run after it's rolled over and finished being written to.

Force merge can be run on the hot node before it rolls over to the warm or after it gets to the warm node.

So it comes back to what are you trying to actually accomplish?

Are all these questions just theory or are trying to accomplish a normal ILM cycle

  1. yes, the index was no longer being written
  2. I checked the index stats API frequently, since the increased size on force merge process has been back to normal (decreased by around 100MB from the normal size), the segments decreased from 37 to 1
  3. I run GET {index}_segments
{
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "indices" : {
    "confluent-app-connect-distributed-uat-2021.09" : {
      "shards" : {
        "0" : [
          {
            "routing" : {
              "state" : "STARTED",
              "primary" : true,
              "node" : "_vUQT3dWTkG0v-_AnWx-LQ"
            },
            "num_committed_segments" : 1,
            "num_search_segments" : 1,
            "segments" : {
              "_6mdl" : {
                "generation" : 308937,
                "num_docs" : 42559293,
                "deleted_docs" : 0,
                "size_in_bytes" : 3441576928,
                "memory_in_bytes" : 8708,
                "committed" : true,
                "search" : true,
                "version" : "8.7.0",
                "compound" : false,
                "attributes" : {
                  "Lucene87StoredFieldsFormat.mode" : "BEST_SPEED"
                }
              }
            }
          }
        ]
      }
    }
  }
}
  1. yes I have, it has 3 primary shard with no replica

I am trying to accomplish a normal ILM cycle, thats why I asked the detailed on force merge because I want to make full ILM cycle with force merge setting

I'm afraid when the ILM is implemented, force merging 500GB index could make a huge down performance on the cluster

If you are looking to reduce the size of the index you should enable best_compression when you forcemerge in your policy.

1 Like

First of all if you had a 500 GB index you would hopefully have at least 10 shards. And only one force merge happens is at a time unless you purposely tell it not to.

Of course my other question would be why do you want 500 GB indexes which you can but why not make 150 gb indexes.

3 shards at 50 GB a piece.

We have hundreds of customers that implement force merge on very large clusters and very large data sets under very large volumes It's all about properly configuring your cluster your data all of it together.

Of course you would test all this...

I will keep this in mind, I will try to tune the shard size next time,

so 2 force merge task can't happen at the same time right? since the force merge should be happen automatically on each indices regardless the condition met on warm phase

By default there is only one force merge thread on any one node at any one time.

If you are very advanced you can change that setting.

If you want to know the exact behavior of that code you would need to look at our code I don't know it at that level I do know there is a single thread. Whether it distributes that thread across more than one shard at a time not certain but I think it works its way through one shard at a time

1 Like

actually it is relief that the code sets everything up like that, so I can actually set the force merge on my ILM now,
so the thing I should set up now only about the index sizing

again, thanks for explaining everything clearly, hope you have a nice day!

noted on this one, I will enable the best compression next time I set up an ILM,
thanks for the point!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.