S3 optimization in elk

Hi ,
We have a 8 node cluster in on premise with 3 years data (3 master + 5 data node) one data node out of this being acting as s3 for cold storage.
Our s3 is getting space issues and we would like to optimize by moving logs older than 2 years to glacier layer in same s3 bucket.

1)Any impact on application performance if we have same s3 bucket with cold layer and glacier layer?
2)As we agreed clients for data search and storage for 7 years . Will they be able to search and retrive data if it is moved to glacier layer ?
3)Any other optimization plans for the same?

Thanks in advance.

Hi Shalini,

Elasticsearch snapshots will not support glacier and will have implications.

2 Likes

Yes, snapshot and restore doesn't specifically support glacier. You could manually manage this however and move the data to glacier and then back when needed, but that is a DIY solution .

But don't send time-series based indices to glacier , It will make your snapshot failed partailly.

Thanks !
Can we still saerch data from S3 Glacier like we search from S3 node?

WE are not planning to move snapshots. We have older logs to be moved from S3 frozen node to s3 glacier layer.
But we need that logs to be saerchable as well

No, that is not supported.

Could you explain further what you mean by:

Our s3 is getting space issues

To my knowledge AWS S3 doesn't have a space limitation. Are you using a different S3 provider other than AWS? Or are you stating that the cold node which stores the cold data is running out of space?

Hi Ben ,

We are looking to move our old indices (2 years old data docs) from s3 to s3 glacier.Based on ILM policies applied, we are moving data from hot to cold node.Our cold node is s3 standard.

We have a question from AWS team as part of cost optimization if we can move the very old data from s3 standard to s3 glacier layer in same s3 bucket as the size is almost in TB's.

We would like to know
1)If we move our old indices from s3 to s3 glacier, will the data be searchable in kibana for users?
2)Will it impact any performance or application?
3)Do we have any other cost optimization ideas for cold storage without interrupting user access to data via Kibana?

Thanks,
Shalini.

You might want to look at Searchable Snapshots (enterprise license needed) but s3 snapshots don't support Glacier.

We are using Licensed version .
As mentioned, we are using S3 as our frozen node with user able to search the data in Kibana.
Once we move to S3 glacier, will it impact saerch?
Not looking for searchable snaphot

Would this not be using searchable snapshots? If not, how is it set up to use S3?

Based on ILM policies the indices are moved to s3 bucket post xx days.

For snaphsots we have policy created to have daily snaphot of instance and keep for 3 months.

Hi @Shalinicts , as has been mentioned above, you can move the indices to glacier, but they won't be searchable if that's the answer you're looking for. Other implications and constraints, as stated above, still apply.

Yes Ayush that was the answer I was looking for. Whether the indices can be saerchable from Kibana.In short use shouldnt feel any difference if the data is in frozen node(s3) or even if it is in s3 glacier .
what are the other implications?

I don't think the statement:

you can move the indices to glacier

is correct, this is explicitly called out in the S3 docs as something you really shouldn't do.

S3 repository

Sets the S3 storage class for objects stored in the snapshot repository. Values may be standard , reduced_redundancy , standard_ia , onezone_ia and intelligent_tiering . Defaults to standard . Changing this setting on an existing repository only affects the storage class for newly created objects, resulting in a mixed usage of storage classes. You may use an S3 Lifecycle Policy to adjust the storage class of existing objects in your repository, but you must not transition objects to Glacier classes and you must not expire objects. If you use Glacier storage classes or object expiry then you may permanently lose access to your repository contents. For more information about S3 storage classes, see AWS Storage Classes Guide

1 Like

Are you moving the data to a cold tier node using searchable snapshots as outlined here? If so, is this where you want to use glacial S3?

If you are not sure, please share your node configuration and ILM policy.

We have 3 master node set as master.node=true
5 data node:
data node 1, 2,3,4 is set as roles: data_hot,data_content
data node 5 is set as roles: data_frozen

Not moving the data to frozen node as searchable snapshots .
ILM policy is different for each type of logs.Maximum retention is for 7 years in frozen node.
This datanode 5 has 2.5 years data and we are thinking if we can have 2 layers for this and move 2 years old data to Glacier if that data can be searchable.

Is the data5 node currently using S3? If you are not using searchable snapshots here, how have you set it up to use S3? Have you mounted a S3 bucket as a volume and pointed path.data to it?

Yes data node is pointing to s3 bucket for data node 5(frozen node)