Huge segments (should I worry?)

teodor.zvezdov · January 30, 2025, 9:24am

Hi,

recently I have been tasked to maintain an existing ES cluster consisting of 3 nodes (3 data nodes). 12 GB heap per node.

After observing the segments of the index "beneficiary2" I saw the following : 2 very big segments ~13 GB each. For me it is strange that both have ~30 million docs.count since the whole index is supposed to have ~ 36 million docs. So the math here does not check out for me. If someone could explain on that I will be grateful.

Should I be worried about the size?

Here is the merge policy:

"policy": {
                        "max_merge_at_once_explicit": "30",
                        "max_merge_at_once": "10",
                        "max_merged_segment": "5gb",
                        "expunge_deletes_allowed": "10.0",
                        "floor_segment": "2mb",
                        "deletes_pct_allowed": "33.0"
                    }

How did it come to this size of segments if max_merged_segment is set to 5 GB? Were they never merged and hence the huge and uneven size?
I assume that these segments will be considered for merging when the number of docs.deleted reaches 33 % of these ~30 million documents per segment or as per documentation when it consists of mostly deleted docs. Should I take any action here?

Thank in advance for further educating me!

DavidTurner · January 30, 2025, 4:41pm

I would guess they are the result of a force merge with ?max_num_segments=1 which overrides the max segment size and merges everything to a single segment. It's the oldest segment on each shard copy by the looks of it. But then you've carried on writing to the index, which is not recommended. Quoting the API docs:

We recommend only force merging a read-only index (meaning the index is no longer receiving writes). [...] But force merge can cause very large (> 5GB) segments to be produced, which are not eligible for regular merges.

It doesn't necessarily need fixing, you can just leave it as is. The only way to get back to a better distribution of segment sizes would be to reindex. Or else just to carry on writing/updating/deleting docs and eventually this huge segment will have so few live docs left that it'll be merged away. It'll just take a while.

teodor.zvezdov · February 1, 2025, 3:24pm

Hi David,

thank you for the quick response. It is exactly what you mentioned. After having a bit more time to play with the environment I found out a cron job on the ES machine itself that executes a force merge with max_num_segments=1 every day at midnight. The use case for this index is to write to it during the day for like 7 hours and then it seems they force merge at midnight. I do not see a good reason for this, however I might lack in knowledge. My only assumption is that they are scared by the large number of segments created during this writing and want to reduce the number of segments in the index...Index is also used for searching at some points during the day and maybe they were scared to have larger number of segments (because it might decrease search performance). Maybe that was their thought process.. However I still cannot justify this force merge cron job. If my understanding is correct there is an automatic merge process in place that should be more optimal. Do you see any other reason/justification for this decision that they took? Any other advice having a bit more context now?

RainTown · February 1, 2025, 3:40pm

One thing to check is timezones.

If an old index is really never going to be written to again then fine to force to one segment, but if you did the force_merge at "midnight" in one time zone, but the indices ideas of days are in a different time zone, then you will be writing to the index after the force_merge.

Also, if there is any delay, and there's always some delay, and some stuff takes 1,5,10,20 minutes to reach the appropriate index, then its already will be going to "yesterdays" index in most cases. So even if such a cron was helpful, better not run it too close to "midnight".

You can see this sort of thing in the Stack Monitoring tabs in kibana.

teodor.zvezdov · February 1, 2025, 4:37pm

What they do is totally intentional. They write to it for like 7 hours and during this time they are also querying and after the writing is finished they continue to query the index and at ~ midnight they decide to force merge to 1 segment. This process is repeated every day. I am trying to understand the logic behind this decision. In my opinion they were scared of the increased number of segments created during writing and decide to force merge every day to keep the number of segments at bay.. However this creates these huge segments as shown in the screenshot that will not be considered for merging anymore (until they consist of mostly deleted documents) which slows down the querying in my opinion. Isn't it better to query 50 smaller segments instead of 2 very huge ones and couple of small ones ?

RainTown · February 2, 2025, 12:46am

Yeah, now I realize I misread, you are dong this every day to the exact same index, rather than a daily created one. So the pattern is index documents for hours to that index, writing across many segments, then at midnight try to crush into one (presumably monotonically growing) segment?

To an extent, it's a sort of poor man's rollover index, without the actual rollover, rather trying to roll over to effectively new segments on daily basis.

This goes directly against the best practice recommendation quoted above. The official docs also label 5gb as "a very large segment", and why those are not a great idea:

Where it happens to specifically say:

" But force merge can cause very large (> 5GB) segments to be produced, which are not eligible for regular merges "

One might speculate that when they enabled the cron, the index (and therefore the big segment) was significantly smaller, so still was eligible for regular merges. Now it's the size it is, with 5M+ deleted docs in it.

My only assumption is that they are scared by the large number of segments created during this writing and want to reduce the number of segments in the index

If left to its own, ES would try to manage any "large number of segments" issue in the background. Since your big segment is now as it is, to get back to a more standard setup requires not just removing the cron, but a re-index. Or (probably) a very long wait.

If it were my system, I'd likely arrange to "fix" it. As knowingly going directly against the documented best practice guidance, for dubious reasons, on a permanently ongoing basis would worry me more. YMMV.

DavidTurner · February 2, 2025, 2:34pm

It's not totally daft to force-merge an index it every day, especially when it's a fairly small index with relatively low write load as seems to be the case here. In terms of search performance, searching lots of segments can indeed be costly, but then so can searching an index with one segment with 50M+ deleted documents, so neither is an obvious winner. It'd be best to run some benchmarks of your current workload on both the current setup and one without the force-merging before you make a decision either way.

teodor.zvezdov · February 2, 2025, 7:28pm

Thanks to both of you for the productive discussion and expressed opinions. This closes the topic in my opinion.

Topic		Replies	Views
About segment merge Elasticsearch	7	942	October 11, 2017
Segment size/ merge policy for large indices Elasticsearch	8	3170	February 18, 2023
Force Merge To do or not to do Elasticsearch	8	3427	March 29, 2022
_forcemerge and pending tasks and segment merge Elasticsearch	2	1324	July 5, 2017
Elasticsearch force merge Elasticsearch	1	459	June 3, 2019

Huge segments (should I worry?)

Related topics