I am using Elasticsearch as a backend to save logs collected from Fluentd logging agent. Specifically, I've set up an EFK logging architecture in my Kubernetes cluster. (AWS EKS cluster to be specific)
I've mounted the container volume of Elasticsearch's /usr/share/elasticsearch/data into an EBS volume.
The question is: using this volume, is there a way to restore the logs? I've cd-ed into nodes/0/indices and saw bunch of folders and files in it -- but I couldn't figure out what they are. I've attached the capture of it.
Please don't post pictures of text, logs or code. They are difficult to read, impossible to search and replicate (if it's code), and some people may not be even able to see them.
I think it might be better to stop back and ask why you are doing this?
Okay I will not post pictures. Thanks for letting me know.
As mentioned above, I've set up an EFK logging architecture in my AWS EKS cluster for production usage.
For log retention strategy, I want old log data (say, 60-days old) to be automatically removed from the EBS volume (where Elasticsearch's container is mounted on).
But at the same time, as our client may request for log data that are older than our criterion (of 60-days), we are planning to take a snapshot of the EBS volume periodically, so that log data that are older than 60-days can also be restored from the snapshots taken before.
Given a snapshot of the EBS volume, then, I must be able to restore logs directly from it to meet my needs.
If there are other ways or better practices to restore logs, I am also willing to follow them.
That will as Mark said not work. Elasticsearch performs consistency checks on data on disk so in any way altering the data directory will make all the data invalid as consistency checks will fail.
The only way to snapshot data is through the snapshot and restore APIs, which allows you to back up data to S3 or a shared file system repository.
Taking a snapshot is the only reliable and supported way to back up a cluster. You cannot back up an Elasticsearch cluster by making copies of the data directories of its nodes. There are no supported methods to restore any data from a filesystem-level backup. If you try to restore a cluster from such a backup, it may fail with reports of corruption or missing files or other data inconsistencies, or it may appear to have succeeded having silently lost some of your data.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.