Selective retnetion for an index

Anusha_Kusanghi · April 3, 2025, 6:27am

We have a requirement in Elasticsearch where, for a single index, we need to store certain fields for one month and others for three months.

Since there is no direct selective retention with ILM , we aim to achieve this by using the clone option.

The plan is to send specific fields to index_one, which has a longer ILM policy, and the remaining fields to index_two, which has a shorter ILM policy. Both indices can then be viewed collectively under index* in Kibana.

Is this a good approach? Additionally, how resource-intensive is the cloning process when dealing with large datasets?

Parllely we also tried with custom scripts that run on the index and delete the field but this will have additional load on our elastic cluster as it parses each record for field deletion so had to rule this out.

We are also exploring Tranfsorm Indices.

Is there any other way to have selective retention for fields in a single index to save storage.

dadoonet · April 3, 2025, 7:10am

I'm wondering if you can use field level security to change the visibility of a field for some users after a month.

Not sure but may be a trick that would avoid expensive reindex operations.

Christian_Dahlqvist · April 3, 2025, 7:21am

Although you can send specific fields to different indices, be aware that Elasticsearch does not support joins so you will not be able to run queries across all fields.

Deleting fields will require reindexing, which can be expensive. An option to save some space and avoid expensive reindexing might be to simply have two series of time-based indices. Into the first one you send the documents with all the fields and you retain this for just 1 month. Into a separate index you send the stripped down documents and keep these around for 3 months. This means that you duplicate storage for the reduced set of fields for one month out of three, but it saves a lot of potentially expensive processing and is simple and low risk.

dadoonet · April 3, 2025, 7:33am

Very good idea @Christian_Dahlqvist!
BTW joins are coming in 8.18 with ES|QL.

Anusha_Kusanghi · April 4, 2025, 5:39am

How you replicate same data to indieces ?

As I discribed above my ideas is to use clone and duplicate data

Christian_Dahlqvist · April 4, 2025, 5:54am

You index the data twice, e.g. using Logstash with a clone processor and dual outputs.

Christian_Dahlqvist · April 4, 2025, 6:18am

Is that general joins where you can join multiple large indices or more like a lookup where you can join a large index towards a limited size data set?

Anusha_Kusanghi · April 7, 2025, 4:54am

My data is not always an array to apply split filter and index into 2 different outputs

Christian_Dahlqvist · April 7, 2025, 5:55am

I meant to write clone processor but somehow got it wrong. I have fixed my previous post and provided a link to the appropriate section in the docs.

Topic		Replies	Views
Retention Policy for a single growing index Elasticsearch	8	1634	March 29, 2018
Mass delete by query Elasticsearch	6	574	July 6, 2017
Large index design question Elasticsearch	7	459	July 6, 2017
Sharding by time Elasticsearch	16	1513	July 6, 2017
Is there a way to copy a subset of data from one elasticsearch index to another? Elasticsearch	1	677	July 6, 2017

Selective retnetion for an index

Related topics