How to use Curator to manage old data and avoid running out of storage space?

jerrac · July 30, 2014, 5:10pm

I've been implementing an ELK stack for the past year or so. I had thought
that we would have plenty of space, but recently added a log source that
increased the number of log entries a day by around 30x. That prompted me
to start looking into ways of managing ES's data storage in order to keep
from running out of space. Which led me to Curator and Snapshots.

If I am reading the documentation[1] for both systems correctly, I think I
can do the following:

Create a repository for old data.
Use a cron job and Curator to automatically take snapshots of data
older than a certain time period (say, 6 months).
- Then have Curator delete the data older than that time period.
- The result would be that all data older than the time period would
  be stored in the repository. The data would be compressed (what kind of
  compression?)
When I have need for data older than the time period, I could use
Curator to restore it to the ES cluster, or even a different ES cluster.
- After that I could do what I needed, before deleting it again.

I'd test all this myself, but I don't have the resources for a decent test
environment yet. Still working on that.

Am I missing anything? Are there better ways to keep from running out of
storage space? Any general advice related to this kind of thing?

Thanks in advance!

[1] http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-snapshots.html
https://github.com/elasticsearch/curator/wiki
http://www.elasticsearch.org/blog/elasticsearch-curator-version-1-1-0-released/

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8957c038-f6d7-47d9-8225-5a975454aa54%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

theuntergeek · July 30, 2014, 9:33pm

Hi David,

Backing up indices to a repository is a great way to conserve space in your
cluster.

Curator provides a helper script called es_repo_mgr that will aid in
creation of a repository. There is more information about snapshot
creation here: modules-snapshots.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-snapshots.html#_shared_file_system_repository

However, it should be noted that Curator is only for taking snapshots—it
cannot restore them. This functionality was omitted because restoring is
not typically a daily occurrence, like the other procedures Curator does.
Fortunately, restoring indices is a relatively simple thing to do with the
API (modules-snapshots.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-snapshots.html#_restore
).

--Aaron

On Wednesday, July 30, 2014 12:10:36 PM UTC-5, David Reagan wrote:

I've been implementing an ELK stack for the past year or so. I had thought
that we would have plenty of space, but recently added a log source that
increased the number of log entries a day by around 30x. That prompted me
to start looking into ways of managing ES's data storage in order to keep
from running out of space. Which led me to Curator and Snapshots.

If I am reading the documentation[1] for both systems correctly, I think I
can do the following:

Create a repository for old data.

Use a cron job and Curator to automatically take snapshots of data
older than a certain time period (say, 6 months).

Then have Curator delete the data older than that time period.

The result would be that all data older than the time period
would be stored in the repository. The data would be compressed (what kind
of compression?)

When I have need for data older than the time period, I could use
Curator to restore it to the ES cluster, or even a different ES cluster.

After that I could do what I needed, before deleting it again.

I'd test all this myself, but I don't have the resources for a decent test
environment yet. Still working on that.

Am I missing anything? Are there better ways to keep from running out of
storage space? Any general advice related to this kind of thing?

Thanks in advance!

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic
Home · elastic/curator Wiki · GitHub

Elasticsearch Platform — Find real-time answers at scale | Elastic

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d7a6924b-cd74-4678-99dc-930c26d95529%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Export old indices before deleting them Elasticsearch	3	924	July 6, 2017
Snapshot & restore Elasticsearch	7	1060	December 4, 2017
Deleting data from Elastic Elasticsearch	9	2510	March 13, 2017
Snapshot management plugin Kibana	6	980	January 8, 2018
How do you run ES with limited data storage space? Elasticsearch	4	2053	July 6, 2017

How to use Curator to manage old data and avoid running out of storage space?

Related topics