Force Merge To do or not to do

I have a scenario in which we have indices for each month, and each document has a field, say expiry date, and according to that expiry date, documents will be deleted when they reach their expiry. One-month-old index will be moved to a warm node from a hot node(All my queries below will be pertaining to the indexes that are on warm nodes).

Now I understand Elasticsearch will merge the segments as needed.

here is my first question, how will Elasticsearch determine that now is the need to merge the segments?
I have come across a property index.merge.policy.expunge_deletes_allowed which has a default value of 10%, does this property dictate when the merging will happen? And it says 10% deleted document, what does that exactly mean? let's suppose if a segment has 100 documents and I deleted 11 of them(that happens to be on the same segment) does that mean the default limit of 10% has been met?

Coming back to the scenario when my documents get deleted at some point there will come a time when all the documents in an index get deleted. What will segments of that index look like then? will it have 0 segments or just 1 to hold index metadata?

Another question regarding the force merge is if I happen to choose force merge to get rid of all the deleted documents from the disk and if force merging resulted in a segment of size greater than 5 GB so as written here. Force merge API | Elasticsearch Guide [8.0] | Elastic

Snippet :
Force merge should only be called against an index after you have finished writing to it. Force merge can cause very large (>5GB) segments to be produced, and if you continue to write to such an index then the automatic merge policy will never consider these segments for future merges until they mostly consist of deleted documents. This can cause very large segments to remain in the index which can result in increased disk usage and worse search performance.

When will my segment(greater than 5GB will get merged automatically if at all?) as it says it will when it consists mostly of deleted documents? That's vague what does mostly mean here? what's the threshold?

Another question is, it is suggested that force merge should only be done on indexes that are read-only. Why is that? how does it degrade performance? coming back to the scenario I will have some updates and new documents coming on my indexes on warm nodes even after I force merge them, but the frequency of those updates and new documents will be very less(we can say less than 5% of the documents will be updated, and less than and maybe a couple of hundred new documents could be added to those indexes).

Also what if I am force-merging 4 450GB indexes( each with 16 shards) in parallel, how will it affect my searching speed? I read somewhere that by default each force merge request is executed in a single thread and that too is throttled if need be? does that mean if search request increases the merging will be paused?

Thank you for your patience and time.

@mikemccand Hey, sorry to tag you but it seems you have knowledge. about this stuff.

Welcome to our community! :smiley:
Please don't ping people that aren't already part of your topic.

The first question is why are you deleting documents from an index if you are using tiered nodes?

1 Like

Sorry once again for that tag, now to answer your question we are deleting documents based on their expiry, there is nothing sure-shot about expiry maybe one document expires in just 3 months of creation while the other has an expiry of 2 years.

We are using tiered nodes because writes on our indexes will be in the 1st month of their creation after that we are carrying out search and deletion mostly, so we are using SSD for hot nodes and HDD for warm nodes(we can think of deletion as writes too because at the end the content on the disk is getting changed, but can't do much about that). We are not using any deletion policy, our documents are only getting deleted when they are expiring.

Ok that makes some sense. It's not really a great approach though, as deletes are expensive.

But yes, you should be force merging things.

Yes, I know it is not a good approach but we are caught between, frequent deletion or writing on our warm nodes.

Can you please clarify some other things mentioned in the original question, like the threshold for auto merging, what effect will force merging will have on search speed if both of them are running in parallel, what is the threshold for merging segments greater than 5GB in size?

I don't know that level sorry.

Not a problem, Thank you for your help.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.