Data fs grows forever after full snapshot is taken

TinaMartiello · February 13, 2024, 9:08am

Hi,
we have 3 elasticsearch nodes, elasticsearch-8.11.1-1.x86_64 and CentOS Linux release 7.3.1611 (Core) o.s. are installed on premis.
We have very busy read/write systems.

We are taking a full snapshot once a day, overwriting the existing one with the following cron command:

08 13 * * * curl -XDELETE localhost:9200/_snapshot/backup/backupfull; curl -XPUT localhost:9200/_snapshot/backup/backupfull

The snapshot creation takes 10 minutes without any error.

The snapshot is taken on a separate fs, not the same fs where data is located.

Randomly, not all the time, after the snapshot is taken, the data fs starts growing.
The size of the index segments is still the same, is not growing, but the data filesystem is growing forever.
If we stop and start elasticsearch the space drop down to the usual occupation. If we close/open the indexes the space drop down as well.

We have decided to comment the snapshot crontab command and we are not experiencing this strange behaviour any more.

Can you help us to find out what is going on?
Thank you,
Tina

DavidTurner · February 13, 2024, 9:16am

Hi Tina & welcome!

Sounds strange indeed. Can you capture the full contents of the data fs (e.g. run ls -lR path/to/my/data) just after taking the snapshot, and then again say 20min later when it's been growing for a while. Also GET _segments at the same times.

This seems like a bad plan btw, don't delete your backup before creating the next one. For starters, it means there's a period of time where you have no backup at all, but also if you take today's snapshot before deleting yesterday's snapshot then Elasticsearch will notice that most of the data hasn't changed which should make the process much quicker.

Also I'd recommend just using SLM rather than your own cron job. SLM is much more robust, e.g. it'll handle failures properly and won't delete your last-good snapshot.

TinaMartiello · February 13, 2024, 9:32am

the backup is saved on a separate device (rubrik backup) before the deletion, so we keep it. If we perform the snapshot before the deletion it failed because snapshot with the same name already exists.

I collect some stats and I will provide you.

DavidTurner · February 13, 2024, 9:51am

Rubrik claims to have some very clever deduplication functionality, but I wonder if it really works as well as the deduplication built into ES itself. In any case it's still a lot more work (including IO, blowing your page cache, and network traffic) for ES to take a full backup of everything every day.

If you call the snapshot <backupfull_{now/d}> then ES will include today's date in its name.

TinaMartiello · February 13, 2024, 1:10pm

That is really a good point, we decided to follow your suggestions and use slm.
We scheduled a backup through kibana.
We keep an eye on it, hopefully we won't have any side effect.

system · March 12, 2024, 1:11pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Snapshot files are created every time even no change in index Elasticsearch	3	402	July 6, 2017
Is the snapshot incremental? Elasticsearch	4	358	July 6, 2017
Starts snapshot from the begining Elasticsearch	6	542	October 3, 2019
Lost es snapshots Elasticsearch	5	620	July 5, 2017
Issues with snapshots Elasticsearch	6	295	May 1, 2023

Data fs grows forever after full snapshot is taken

Related topics