Total storage of Elasticsearch index grows abnormally

Thijsvdp · November 22, 2024, 8:30am

We are facing the following problem:

We have seeded an index ~6 months ago. Back then the total space occupied was roughly 7TB of data.
Over the last 6 months we have been updating the index to keep it synced with the source data.
Over this period of 6 months the total index size has grown to 11TB.
Now we needed to reseed the data in a new index (because we wanted to support some more fields). The expectation was that it would total to ~11TB again.
Now it turns out that the total disk space of this new index is only 8.5TB. The total number of documents seeded match.
I can't explain where this large difference of 8.5TB versus the expected 11TB comes from.

Could anyone help?

We have the following setup:

10 nodes, each having:

64 vCPU
256GB RAM (32GB Heap)

The data consists of:

~100M documents (1.6b total)
Multiple nested mappings
Large text fields
HNSW index (for dimension 768)

leandrojmp · November 22, 2024, 1:12pm

How many shards does your index have?

Since you update the index I would expect that this extra size was occupied by the deleted documents.

Deleted documents are only removed when a segment merge, but there are some requirements before elasticsearch will merge a segment, like the shard size and percent of deleted documents, if you have a lot of shards it is possible that those requreriments were not reached yet and the segments didn't merge, so the deleted documents would still occupy some space.

When you indexed it in a new index the deleted documents were not indexed, so it would require less space.

Thijsvdp · November 22, 2024, 2:27pm

The number of shards is 280 for both indices. The total number of segments is also roughly equal: a little over 9000.

I indeed see the number of segments grow during writes. But then after some time it reduces to this 9000 segments.

Not sure it this helps?

leandrojmp · November 22, 2024, 3:05pm

Yeah, I think this explains the difference in size, the deleted documents weren't purged yet.

Elasticsearch automatically merge segments and purge the deleted documents, I'm not sure exactly how this is done, but for what I know it takes in consideration the size of shards and segments and the percent of deleted documents as triggers.

Topic		Replies	Views
Shards getting bigger with updates (same number of documents) Elasticsearch	14	626	March 16, 2021
Larger index size after Elasticsearch reindex Elasticsearch	9	2469	April 12, 2019
Document Count is same however index size is growing - How? Elasticsearch	8	4325	July 11, 2017
Index Size explosion (17 GB -> 840 GB) Elasticsearch	8	474	July 6, 2017
Index size variation when updating Elasticsearch	1	416	November 25, 2019

Total storage of Elasticsearch index grows abnormally

Related topics