How do I force a forcemerge to run for shards with only one segment?
I changed index.codec on my index to BEST_COMPRESSION and I want my segments use it.
But when I run forcemerge, it immediately returns. Guessing because "index is already forcemerged".
I did it exactly the same on two indices: First was already 1-segment-per-shard, second had more segments. Result was that only second index switched to BEST_SPEED. Segments of first index were not touched by forcemerge at all. They even stayed at lucene 8.9.0.
But I think all that is irrelevant.
I can confirm that both indices have "codec" : "best_compression" in their settings.
I can confirm that launching forcemerge manually (API, Kibana) returns without doing anything.
Index with 1-segment-per-snard was not touched and its segments are still BEST_SPEED.
It is one of the above ones. index.codec is set on the index, but segments are staying BEST_SPEED and forcemerge returns immediately without doing anything.
On indices that "need forcemerge" this works with no problem. But on shards with only one segment forcemerge decides there is nothing to do and returns immediately without rewriting segment with new codec.
I strongly disagree. I still need to compress the data. I repeat: Index settings are set correctly to "codec" : "best_compression", but all segments are still "Lucene87StoredFieldsFormat.mode": "BEST_SPEED". Forcemerge returns immediately without doing anything. How to I force forcemerge to run for a shard with only one segment, i.e. the case that forcemerge skips?
So I assume you did this by closing the index and setting the codec and then re-opening the index... Correct?
And you did this when you already have a single segment?
And that segment was already BEST_SPEED?
It the answer to all those are yes... then I do not think forcemerge will work, as you experienced. I think if you will have to reindex.
Once you have one segment forvemerge is not going to do anything because it's about merging segments not the codec.. using the new codec It's just a side benefit when an actual merge is done.
If you set the best_compression BEFORE you were merged to one segment then perhaps... I would need to test.
BTW I did test if you use forcemerge and it actually executes and merges because you have more than 1 segment it will "honor/use" the new codec.
Technically I asked ILM to do it, but AFAIK ILM does exactly what you describe.
Yes, I am having trouble switching coded on shards that already have only one segment.
Yes. But I would say that segment was still BEST_SPEED.
Yes, that is the reason why I opened this topic. I do not want to reindex again. Reindexing is what caused my problems, because I hit _source 50% bigger after reindex. Now I need to switch codecs to get that disk space back.
Yes, any shards that had more segments switched to the new codec.
I was hoping one of you would suggest a trick to force the forcemerge to run.
E.g. I thought about creating new segments by adding a couple of dummy documents and then deleting them. But I am afraid that forcemerge will be too clever and just delete the new segments.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.