Snapshot Bloat

Hi!

I'm attempting to snapshot my logstash files on EC2 and back them up to an S3 repository. I only have a single dev EC2 instance with dummy data (my prod has multiple instances). I only have a single node for this repo (as confirmed by the `curl -XPOST 'http://localhost:9200/_snapshot/snapshot_name/_verify' command. When I tell curator to make the snapshot, it appears to run correctly, but the disk usage on the S3 is HUGE relative to the original logstash.

For instance, for a given EC2 logstash directory (/logstash-2015.09.28/) that has a du of 164KB, the du on S3 (s3://mys3bucket/snapshots/elasticsearch_backup/indices/logstash-2015.09.28/) is 305MB!

I assume this is not expected behavior? What could be causing this and how do I fix it?

Thanks!

Do you only have a single Elasticsearch node? Also, there is metadata that needs to be stored in the event of a restore.

Hi Aaron, thanks for the response.

That is correct: this cluster is just a single node.

I understand about the metadata, but I do have it set to 'compress: true' in the yml. I wouldn't expect that the metadata be 900x the size of the actual data. I've also only done about four or five manual snapshots, so it's not like thousands and thousands of metadata files are stacking up.

For what it's worth, I am using an older version (1.4.4), is this simply an old issue?

Is there anything else that I should investigate?

See: Snapshot module | Elasticsearch Guide [8.11] | Elastic

Specifically:

By setting include_global_state to false it’s possible to prevent the cluster global state to be stored as part of the snapshot.

It may be that the index metadata (which include mapping and other information) and the cluster state are part of the size you're seeing. Without digging into what's actually been stored, these are things I know are stored by default with snapshots.