Issue with ingesting data and disk size

I am facing a very weird issue.
I tried uploading 30 GB of csv data in Elasticsearch using python client.

The below is the disk usage when I quit ingestion:-

shards disk.indices disk.used disk.avail disk.total disk.percent host     ip       node
     1       28gb    30.3gb      1gb     31.3gb           96 10.0.0.2 10.0.0.2 113b51464987

And after couple of min when I saw it was the following:-

shards disk.indices disk.used disk.avail disk.total disk.percent host     ip       node
     1       22.9gb    24.7gb      6.5gb     31.3gb           78 10.0.0.2 10.0.0.2 113b51464987

Can anyone please elaborate on this?

It's likely because you had segments being merged in the background which has brought the disk space down.

Thanks for the reply!
I am facing this issue everytime while ingesting.
When I am about to complete ingesting like say 30GB of data, the next minute the total storage comes back to 25GB or even less.

Is this the normal behavior?

Yes. Elasticsearch creates immutable segments during indexing. These are periodically merged/combined into larger segments. During a merge new segments will be created while before the old ones are removed, so disk space can increase and contract a bit over time. This is likely to occur after your final segment has been created and indexing has completed, which is what you are seeing.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.