How to force a forcemerge?

How do I force a forcemerge to run for shards with only one segment?

I changed index.codec on my index to BEST_COMPRESSION and I want my segments use it.
But when I run forcemerge, it immediately returns. Guessing because "index is already forcemerged".

POST /myindex/_forcemerge?max_num_segments=1&pretty
{
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "failed" : 0
  }
}

How do I force a forcemerge to run when shards have only one segment, but it is with undesired codec?

It must be something very obvious, but I don't see it.
Thank you very much

What was your request when you changed to codec?
What is the output from GET myindex/_settings?

1 Like

I don't know. I enabled forcemerge in ILM policy and asked ILM to do it.

At first I tried it using this example here: Move to lifecycle step API | Elasticsearch Guide [8.2] | Elastic, but then I realized how bad that example is, because it skips all the preparation steps.

Anyway, this is the request:

POST /_ilm/move/myindex?pretty
{
  "current_step": {
    "phase": "cold",
    "action": "complete",
    "name": "complete"
  },
  "next_step": {
    "phase": "warm",
    "action": "forcemerge"
  }
}

I did it exactly the same on two indices: First was already 1-segment-per-shard, second had more segments. Result was that only second index switched to BEST_SPEED. Segments of first index were not touched by forcemerge at all. They even stayed at lucene 8.9.0.


But I think all that is irrelevant.
I can confirm that both indices have "codec" : "best_compression" in their settings.
I can confirm that launching forcemerge manually (API, Kibana) returns without doing anything.
Index with 1-segment-per-snard was not touched and its segments are still BEST_SPEED.

If they both have the right codec then there's nothing to do.

Is that one of the above ones, or a separate one?

It is one of the above ones. index.codec is set on the index, but segments are staying BEST_SPEED and forcemerge returns immediately without doing anything.


Let's document that properly:

curl "$SERVER/?pretty" | jq '.version.number'
"7.17.3"

Verify that index.codec is set on index:

curl "$SERVER/myindexA/_settings?pretty" | jq 'map_values(.settings.index.codec)'
{
  "myindexA": "best_compression"
}

Get number of segments per shard

curl "$SERVER/_cat/shards/myindexA?v=true&h=index,sh,pr,sc"
index    sh pr sc
myindexA 2  p   1
myindexA 1  p   1
myindexA 0  p   1

Try to forcemerge and measure time - it returns immediately

time curl -X POST "$SERVER/myindexA/_forcemerge?max_num_segments=1&pretty"
{
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "failed" : 0
  }
}

real    0m0.343s
user    0m0.137s
sys     0m0.149s

End result:

curl "$SERVER/myindexA/_segments?pretty" | | jq '.indices | map_values(.shards | map_values([.[].segments | to_entries[].value.attributes."Lucene87StoredFieldsFormat.mode" ]))'
{
  "myindexA": {
    "0": [
      "BEST_SPEED"
    ],
    "1": [
      "BEST_SPEED"
    ],
    "2": [
      "BEST_SPEED"
    ]
  }
}

On indices that "need forcemerge" this works with no problem. But on shards with only one segment forcemerge decides there is nothing to do and returns immediately without rewriting segment with new codec.

I strongly disagree. I still need to compress the data. I repeat: Index settings are set correctly to "codec" : "best_compression", but all segments are still "Lucene87StoredFieldsFormat.mode": "BEST_SPEED". Forcemerge returns immediately without doing anything. How to I force forcemerge to run for a shard with only one segment, i.e. the case that forcemerge skips?

So I assume you did this by closing the index and setting the codec and then re-opening the index... Correct?

And you did this when you already have a single segment?

And that segment was already BEST_SPEED?

It the answer to all those are yes... then I do not think forcemerge will work, as you experienced. I think if you will have to reindex.

Once you have one segment forvemerge is not going to do anything because it's about merging segments not the codec.. using the new codec It's just a side benefit when an actual merge is done.

If you set the best_compression BEFORE you were merged to one segment then perhaps... I would need to test.

BTW I did test if you use forcemerge and it actually executes and merges because you have more than 1 segment it will "honor/use" the new codec.

1 Like

Yes, you got it right.

  • Technically I asked ILM to do it, but AFAIK ILM does exactly what you describe.
  • Yes, I am having trouble switching coded on shards that already have only one segment.
  • Yes. But I would say that segment was still BEST_SPEED.

Yes, that is the reason why I opened this topic. I do not want to reindex again. Reindexing is what caused my problems, because I hit _source 50% bigger after reindex. Now I need to switch codecs to get that disk space back.

Yes, any shards that had more segments switched to the new codec.
I was hoping one of you would suggest a trick to force the forcemerge to run.

E.g. I thought about creating new segments by adding a couple of dummy documents and then deleting them. But I am afraid that forcemerge will be too clever and just delete the new segments.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.