Running _forcemerge on an index that is being write on

Hi,
we are having an index with heavy updates on the documents. This leads us to having a lot of uncleaned deleted documents, for example, we can have around 400M searchable documents and around 150M of uncleaned documents. It's even worst on larger ES clusters that we are having. Since this ratio is pretty high, we want to look into running _forcemerge on the index without stopping writes on it, or maybe on a low traffic window, like weekens.
What would be the impact on the index if we do this while keeping the writes on? I assume that having a bit less uncleaned documents, it will help ES manage segments faster and more reliable while also we can reclaim a bit of disk space.

You can see below another big cluster that we are having and the number of uncleaned documents.

I would not advise running a standard _forcemerge against an actively written index as it can cause issues, this is mentioned in the docs here.

One option however if you really need to clean up the deleted docs for some reason is use _forcemerge with only_expunge_deletes=true. This will only remove segments that have a higher % of deleted docs. More about this query parameter can be found here.

Generally, I recommend just letting Elastic handle the cleanup process, but if you really need to do it manually for some reason, I'd recommend testing the command on a non-production index first to see if you have any negative performance impacts from it.

Yes, I am aware that is not recommended to _forcemerge on an actively written index.

Thanks for the only_expunge_deletes=True tip tho, we might try that instead.

The main reason we want to try this is because of the high number of uncleaned documents. This is due to the nature of how we index the data as a lot of updates are made on the documents. We might let ES handle this, but then when there is a spike of updates on documents from a single shard, the merging kicks in and ES can't really handle the load of merging the segments and at the same time indexing documents on the same shard and we get degraded performance.
So we are looking maybe to run _forcemerge from time to time, maybe during weekends to maybe ease the thing out. Not sure what else we can try...

ES can't really handle the load of merging the segments and at the same time indexing documents on the same shard and we get degraded performance.

This is interesting. I'd say that only_expunge_deletes=true run off hours might help here, but another solution could be to split the index into more shards that can better distribute the load of the merges.

At this point I'd say the solution really depends more on your needs though, as I think either would work.

Yeah, it's a bit problematic for us to reindex at this point, but we are aware that we need to increase the shards number, it's just not a quick solution for us at the moment :sweat_smile: Hence us trying to find different other ways to fight this until we reindex everything.

I guess the main problem here is that Elasticsearch (well, Lucene) doesn't think this is a particularly high number of deleted docs. Merging is an expensive process, it's usually better to delay it until there are enough deleted docs to make it worthwhile.

1 Like

This suggests there might be something else wrong in your config. I'd recommend focussing on the reasons for this degraded performance directly. A period of high indexing load is going to lead to merges whether you force merge beforehand or not.

Hi David,

yeah, I've actually run _forcemerge on the index while I stopped the indexing and it seems to worked out, disk space reclaimed which is nice.
But we do seem to still have the problem with indexing since we are using _bulk api. I've tried everything from increasing/decreasing the bulk size, event with 10 update by script requests but it still very slow. From my observations, it seems that most of the updates are going to a single shard since we use routing. It constantly get's deallocated and allocated again. Not sure how to deal this this one to be honest.

Resource wise, we are using quite large machines, 16vCPU and 64GB of memory on Elastic Cloud hoping that more resources would help us with this, but it does not seem to be the case.

Do you mean the shard is failing? If so, that definitely shouldn't happen, I recommend addressing this.

Yes, this is what I am trying to figure out and I just take options one by one.
The main problem being that most of the requests are going to a single shard which I assume ES is not able to handle it properly? Not sure what other options we have to speed the process, does disabling shard replicas would help? To only have primary shards and finish the indexing and then enable back replicas?

It should not be possible to cause a shard to fail just by sending it too much indexing traffic. Instead, ES should just push back (typically by returning a 429 Too Many Requests). So if it's not doing that for some reason then we need to understand why and address the fundamental problem. For instance, what is the exact nature of the shard failure? What appears in the ES logs, for instance, and in the responses to indexing requests? Perhaps this is a bug, but we cannot fix it without understanding it in more detail.

1 Like

In the end, we figured it out. It

It was because of the data. The documents we were trying to index were taking a lot of time to be indexed due to the nature of the data and because we run some regex expression at index time. Once we sorted that out, everything became normal again.

But on the topic of _forcemerge, I think we indeed have to run it for sure. The reason we are having that many uncleaned deleted documents is because we are having a lot of old big >5GB segments with a lot of deleted documents.

index            shard prirep ip           segment generation docs.count docs.deleted     size size.memory committed searchable version compound
index-name-v0    0     p      ip-address-here _cwg         16720    2929550      2730209    5.2gb           0 true      true       9.3.0   true
index-name-v0    0     p      ip-address-here _d1y         16918    2866985      2800087    5.2gb           0 true      true       9.3.0   true
index-name-v0    0     p      ip-address-here _d6s         17092    3199872      2423136    5.4gb           0 true      true       9.3.0   true
index-name-v0    0     p      ip-address-here _dex         17385    3037376      2075589    5.4gb           0 true      true       9.3.0   true
index-name-v0    0     p      ip-address-here _dix         17529    3429513      2044662    5.2gb           0 true      true       9.3.0   true
index-name-v0    0     p      ip-address-here _dnc         17688    3464548      1821027    5.5gb           0 true      true       9.3.0   true
index-name-v0    0     p      ip-address-here _drz         17855    3223482      2061821    5.5gb           0 true      true       9.3.0   true
index-name-v0    0     p      ip-address-here _dx1         18037    3154251      2095340    5.4gb           0 true      true       9.3.0   true
index-name-v0    0     p      ip-address-here _e1u         18210    3478554      2015227    5.3gb           0 true      true       9.3.0   true
index-name-v0    0     p      ip-address-here _e88         18440    3621043      1688986    5.2gb           0 true      true       9.3.0   true
index-name-v0    0     p      ip-address-here _edw         18644    3670615      1842516    5.3gb           0 true      true       9.3.0   true
index-name-v0    0     p      ip-address-here _ely         18934    3883761      1323761    5.5gb           0 true      true       9.3.0   true
index-name-v0    0     p      ip-address-here _euw         19256    4342507      1078338    5.3gb           0 true      true       9.3.0   true
index-name-v0    0     p      ip-address-here _fbs         19864    4215320       885812    5.2gb           0 true      true       9.3.0   true
index-name-v0    0     p      ip-address-here _fwt         20621    4401008       865692    5.1gb           0 true      true       9.3.0   true
index-name-v0    0     p      ip-address-here _hbk4       808132    4390760       277667    4.8gb           0 true      true       9.3.0   true
index-name-v0    0     p      ip-address-here _v2d4      1449400    4845760       112970    5.1gb           0 true      true       9.3.0   true
index-name-v0    0     p      ip-address-here _ved8      1464956     432762       240300  523.7mb           0 true      true       9.3.0   true
index-name-v0    0     p      ip-address-here _vfja      1466470     310035        43605  370.1mb           0 true      true       9.3.0   true

And based on the docs, normal merging does not run on segments larger than 5GB.

But force merge can cause very large (> 5GB) segments to be produced, which are not eligible for regular merges.

I find the working a bit confusing, but it seems to make sense in our case as based on the generation of the segments, we can see that we have very old segments that were always there...

Great, but ...

... I still think this should not have led to shard failures, and would like to understand whether you were encountering a bug.

Again, Elasticsearch doesn't think this is too many deleted documents. These deleted docs will be merged away when it becomes worth doing so.

Well, I am ready to help digging into the issue, I just don't know how :). I can provide a deployment id if you are interested as we run on Elastic Cloud, or anything else if it's needed. Just DM me :slight_smile:

Again, Elasticsearch doesn't think this is too many deleted documents. These deleted docs will be merged away when it becomes worth doing so.

But then how come those segments I showed above were not merged? The ones larger than 5GB. We have created the index around 4 months ago and since then, we are constantly writing to it using _bulk API.

They don't have enough deleted docs to be a merge candidate yet: the live docs are still more than 2.5GiB in all cases so merging any of them together would yield a segment that's even larger than 5GiB, which is something Lucene deliberately avoids. Once there's a way to merge them together without (significantly) exceeding the 5GiB target size, Elasticsearch will do so.

I see. That is interesting, but then there is the cost efficiency problem, no? In our case, on the deployment I ran _forcemerge and shared with you the id, we reclaimed almost 1Tb of disk space just by cleaning up the deleted documents. In our case this is important as we mostly scale up the clusters due to running out of disk space.

The shard failures do indeed look to be a bug: somehow your traffic pattern is generating messages between nodes which are so large that the receiving node briefly leaves the cluster. I don't really expect Elasticsearch to handle such large messages, but it should fail more gracefully than it does currently (and ideally should not fail the shard in this situation). I opened Transport messages exceeding 2GiB are not handled gracefully · Issue #94137 · elastic/elasticsearch · GitHub.

I do see your point, maybe we could find some way to reclaim this disk space more enthusiastically in cases where disk space is the limiting factor and savings are available. Please open an issue on Github to start a discussion about this.

Oh, that is very interesting and useful. Right now we sometimes have quite large bulk sizes both in terms of items or size. Should we lower them then? Also, how can we monitor this?

I do see your point, maybe we could find some way to reclaim this disk space more enthusiastically in cases where disk space is the limiting factor and savings are available. Please open an issue on Github to start a discussion about this.

I'll do that. Thanks

Bulk requests are limited in size to 100MiB at the HTTP layer, so for the request to reach 2GiB you must be adding the vast majority of the data (≥95%) as the request passes through your ingest pipelines. Smaller bulks would help keep the final size under 2GiB, but also maybe your ingest pipelines could be simplified?

I don't know of a good way to see the post-pipeline bulk size unfortunately. I'm not sure anyone has hit this problem before. I couldn't find any other clusters in all of Elastic Cloud that had experienced anything similar.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.