Snapshot Bloat

ryan.mccarthy · October 1, 2015, 4:51pm

Hi!

I'm attempting to snapshot my logstash files on EC2 and back them up to an S3 repository. I only have a single dev EC2 instance with dummy data (my prod has multiple instances). I only have a single node for this repo (as confirmed by the `curl -XPOST 'http://localhost:9200/_snapshot/snapshot_name/_verify' command. When I tell curator to make the snapshot, it appears to run correctly, but the disk usage on the S3 is HUGE relative to the original logstash.

For instance, for a given EC2 logstash directory (/logstash-2015.09.28/) that has a du of 164KB, the du on S3 (s3://mys3bucket/snapshots/elasticsearch_backup/indices/logstash-2015.09.28/) is 305MB!

I assume this is not expected behavior? What could be causing this and how do I fix it?

Thanks!

theuntergeek · October 1, 2015, 5:05pm

Do you only have a single Elasticsearch node? Also, there is metadata that needs to be stored in the event of a restore.

ryan.mccarthy · October 1, 2015, 7:45pm

Hi Aaron, thanks for the response.

That is correct: this cluster is just a single node.

I understand about the metadata, but I do have it set to 'compress: true' in the yml. I wouldn't expect that the metadata be 900x the size of the actual data. I've also only done about four or five manual snapshots, so it's not like thousands and thousands of metadata files are stacking up.

For what it's worth, I am using an older version (1.4.4), is this simply an old issue?

Is there anything else that I should investigate?

theuntergeek · October 3, 2015, 7:19pm

See: Snapshot module | Elasticsearch Guide [8.11] | Elastic

Specifically:

By setting include_global_state to false it’s possible to prevent the cluster global state to be stored as part of the snapshot.

It may be that the index metadata (which include mapping and other information) and the cluster state are part of the size you're seeing. Without digging into what's actually been stored, these are things I know are stored by default with snapshots.

Topic		Replies	Views
Taking snapshot doubled disk usage Elasticsearch	2	448	July 5, 2017
Snapshots to S3, only meta files are being created in bucket Elasticsearch	7	728	January 7, 2018
Snapshot compress not compressing? Elasticsearch	4	3366	July 6, 2017
Copying only changed files Elasticsearch	1	371	November 6, 2017
Backup size ~twice as large as index size - normal behavior? Elasticsearch	3	795	July 5, 2017

Snapshot Bloat

Related topics