Segment size/ merge policy for large indices

I am new to Elastic, I started learning about it in depth a month ago as I moved into a position to take over the more or less unmanaged cloud install we had.

We have quite a few indices with segment counts ranging from 40-220 segments per index. Per our use case these indices do not roll over and will not be setup to roll over as it is not time series data. All our indices currently use the default max segment size of 5gb. The problem with this is that deletes are never being dealt with and segment count will simply continue to increase over time.

We are currently on 7.17 but will be working to upgrade to 8.x in a few months.
Our max heap size is 28gb.

What I am wondering, especially for our worst offenders is:

How large should we/can we make segment size?

Is there potentially a better way to handle the problem then doing it through increasing the segment size?

Hi @GregoryJC and welcome!

Can you explain a bit more clearly how you've come to these conclusions? Deletes are dealt with automatically, and the segment count won't grow without bound.

From what I understand after reading about segments and merges, deletes are only taken care of when merging is done. Once segments hit the max size allowed, they can no longer be merged automatically and deletes are no longer cleaned up automatically. As the data in an index grows, more and more segments will be unable to merge, and more segments will be created.

I could be wrong on this. But this is my understanding of it from everything I've read so far on how it works.

Yes this isn't correct. Merges will clean up all sizes of segments once they accumulate enough deletes.

That behavior does not seem to match what I see. Which is segments sitting at or near the 5gb limit with deletes nearing 50% of the number of docs.

image

That sounds about right. Once they accumulate enough deletes they will be subject to merges. These ones just don't have enough deletes for it to be worth merging them yet.

There is always Force merge API | Elasticsearch Guide [8.6] | Elastic

I don't think that's a good recommendation here. From those docs:

There's no need to take any action here, and doing something like a force-merge will just cause different (and harder-to-fix) problems in the long run.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.