Massive index compression

revelc33 · February 9, 2024, 11:59am

How can I easily compress 140 indices (some of which are up to 100 GB)?
I have tried

closing the indices,
changing the codec to 'best_compression,'
and then executing a forcemerge on the index.

The forcemerge task finishes very quickly, and the index does not change its size.

However, if I manually reindex, the index is compressed by up to 60%.

Is there a way to do this in bulk? I don't mind having to do it with a Python script. Thank you!

"Note: I made a script to reindex one by one, but some reindexing takes more than 10 hours, and it's complex to sequence with so much delay between reindexations. I've also tried parallelizing several reindexations at once. But what I would like is to let this parallelization and sequencing be done and managed by the cluster itself."

DavidTurner · February 9, 2024, 12:15pm

Please don't ping folks to draw them into a conversation like that, especially not after just a few minutes. It's very rude and violates the community code of conduct. We're all just volunteers here.

revelc33 · February 9, 2024, 12:20pm

OK . Sorry David ! Sorry I didn't know it!

Christian_Dahlqvist · February 9, 2024, 4:05pm

Forcemerging should be quicker than reindexing so I would recommend doing what you described. The only reason the forcemerge task would finish very quickly is if the index has already been forcemerged down to a single segment. If you are doing this as part of an index lifecycle policy I would recommend changing the codec there as well.

To force a forcemerge I think you need to increase the number of segments of the shards. You can do this by indexing a dummy document with a known ID and then immediately delete it. If your indices have more than 1 primary shard you may need to index and delete multiple documents so you know all shards have more than 1 segment.

revelc33 · February 9, 2024, 5:10pm

Thks @Christian_Dahlqvist ! The thing is this indices are on the last stage of the ILM. So the forcemerge is done. So because of that even i tried to forcemerge nothing happpens.

What can i do to apply this massively? Can i apply like reindex in batch mode?

Christian_Dahlqvist · February 9, 2024, 5:14pm

Any reindexing you will need to manage yourself, e.g. using a script. Why not add and delete a document to the index and then perform another forcemerge down to a single segment as I suggested instead of reindexing?

revelc33 · February 12, 2024, 9:31am

Ok . So this can work? Just to clarify

Add a document
forcemerge to 1 segment .

Christian_Dahlqvist · February 12, 2024, 9:36am

Yes, that should work. If you do not want the additional document to pollute the index you can also remove it before running the forcemerge. Not sure whether you may also run a refresh or not.

revelc33 · February 12, 2024, 12:58pm

it worked perfect! Thanks!

revelc33 · February 12, 2024, 5:24pm

Now i have my 140 indexes on force merge queue ! But the thing is i have 3 servers with enough resources to made more than one force_merge at the same time.

I tried to increase the thread_pool of force_merge


PUT _cluster/settings
{
      "persistent" : {
        "thread_pool.force_merge.size" : 5
    }
}

But i get persistent setting [thread_pool.force_merge.size], not dynamically updateable .

Do i have any way of doing this without restarting servers? (it is a productive environment!)

Christian_Dahlqvist · February 12, 2024, 5:47pm

No, I do not. I would however recommend not changing this. Forcemerging is primarily disk I/O intensive rather than very taxing on CPU and RAM/heap.

system · March 11, 2024, 5:47pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ILM forcemerge of empty index takes ~5 minutes Elasticsearch ilm-index-lifecycle-management	1	232	June 15, 2022
How to force a forcemerge? Elasticsearch	8	1161	June 29, 2022
Index size is increasing during the forcemerge Elasticsearch	5	1812	October 9, 2019
Compressing and forcemerging past time-based indices Elasticsearch	29	1790	January 7, 2021
Es forcemerge question Elasticsearch	5	523	October 2, 2020

Massive index compression

Related topics