I've a legacy Elasticsearch 2.3 index. It is no longer being updated with new entries, instead the only modifications are deletes. All our new data targets an Elasticsearch 7 cluster. Though we do still read from the ES2 index.
The goal for the legacy 2.3 index is to scale it down. One year ago it was made up of ~9.5 billion active documents with ~1 billion deleted documents. Now those numbers have changed to ~8 billion active documents with ~ 2.5 billion deleted documents.
My concern is that the total number of documents has not reduced much at all. i.e. It has stayed at ~10.5 billion. This is somewhat of a problem as it is delaying our efforts to scale back the 2.3 cluster.
Over the last year only ~47 million documents were purged. ~37 million of which were purged over a ~9 day period.
Reading up on this topic I became aware of the forcemerge API. Is this my best option? Running it against a local (unfortunately ES7) docker cluster I found that it does remove deleted documents when only_expunge_deletes is set to true.
But there's obviously a big difference between a local docker cluster (with the wrong ES version) and a production index which is still being read.
In short what's my safest option or strategy to tackle this problem and reduce the ES2 disk size?
I'll also include a subset of the _cat/segments API output in case it helps. I'm including 200 lines but the total output is 20,612 lines. The pattern through out is fairly consistent. i.e. Most 4.x gb with more than 10% of the segment made up of deleted documents.
I should mention there are indexes other than this-legacy-index on the ES2 cluster. I'm focusing on this-legacy-index as it is by far the largest and oldest. The other indexes have a similar segmentation, though there might be slightly less 4.x gb segments.
A follow up question. We ran a force merge over the weekend. This has caused most our deleted data to be removed. Namely ~2.5 billion documents. That's great.
I can see it also dropped our segment count from ~20k down to ~2k. But the downside is that we now have 110 segments sized with more than 100 GB of data. For performance reasons we've tried to keep our segments below 5 GB.
Any thoughts or recommendations on how we can re-balance the segments?
I focused on segments as the advise was to keep them below 5 GB. But I'm not sure how strictly that needs to be followed. Monitoring the legacy index after the forceMerge I see the search duration has increased from peaks of ~700ms up to peaks of ~1.2s. There's also an increase in heap usage from ~2 GB to ~3.3 GB. But that seems acceptable.
For shards we focus on keeping them below ~50 GB.
The goal for the legacy index is to slowly reduce it. As such we've also kicked off a node reduction. i.e. Going from ~180 to ~150 nodes. This could also be contributing to the above.
Honestly, upgrading will get you much further.
As a high level example, not taking into account a tonne of things, 7.X should handle those 2700 shards on 4 nodes. 2.X does not manage shards efficiently.