Offload cold data to s3 in cloud

In log analysis scenario, user could offload the cold data to s3 to reduce the cost. For example, vertica has support this feature. Does es have the plan to do this?

We are working on something called frozen indices that will let you sequentially search though previously closed indices. I am not sure of the timeline for that though.

could you provide more details about " sequentially search though previously closed indices" ?

Closed indices cannot be read, you need to open them, recover them and then read the data (and reclose them if you want). This needs to be done manually at the moment and we are working on a process to automate this.

So if you have indices from 2015, 2016 and 2017 that are all closed and you query for data that lives in 2015 and 2016 indices, it will open+recover+read+close the 2015 indices, then the 2016 indices automatically.

This means you don 't end up using heap for lots of open shards.

Does the closed index is still in local disk or is moved to shared storage like s3?
I think it is better to move the closed index to s3 because it does not need to maintain replicas in es cluster and s3 is a bit cheaper than local storage. When user want to read the closed index, just open and read it doest not need recovery any more.

what do you think about it.

Closed indices still need to reside on disk. Elasticsearch does not support querying live indices from S3. Given the access patterns of Elasticsearch when querying, trying to serve queries from S3 (if it was possible) would most likely be unacceptably slow.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.