Split non-ILM large index

Hi

I have a writable index with 2.1Tb. 1 shards, 1 replica.
No ILM (my mistake).
named office-project-version (no 000001 in the end)

How can I split it into smaller pieces?

  1. adding ILM doesn't work, as an alies doesn't point to the index,

index.lifecycle.rollover_alias does not point to index

when I'm pointing it manually I get an error

index name must match the regex pattern ^.*-\d+
( as index doesn't end with 000001)

  1. Reindexing to the new name fails, as index is too big.
  2. I can not split an index, as it is still writable. closing an index or make it read only is not an option, as it is running production data. so I need to "move" incoming data somewhere else
  3. I created new index template with data stream, made current index not writable (non prod have the same issue, so was testing there), was expecting incoming data to go to the datastream.. but received an error

POST _reindex?slices=5&refresh
{
"source": {
"index": "office-project-version"
},
"dest": {
"index": "office-project-version-new"
}
}

"request body is required"

Any ideas?

Version 7.17.9
I feel that answer is somewhere around, but I can not find it.

ILM works by deleting complete indices, so you can not get this huge index in under ILM.

Does this index contain immutable data or do you also update/delete data?

What type of data is this? Is it appropriate for a time-based indices approach?

ILM was not added in the beginning, so index doesn't have a proper name

it is APM (yes, time-based) data, but I don't want to update/delete it, as it is production logs

How long time period does this index cover? What retention period are you looking to enforce if you enable ILM?

7 months, but it should be stored only for 3 months ( so yes, I can safely delete first 4 months)

I would recommend you set up a completely new data stream and start writing any new data into this. You should then be able to add an alias on top of the large index and use this directly or as part of an index pattern to query the new data stream and the existing index together. Once all data in the current index has exceeded the retention period you can delete it and rely on just the data stream.

If you want to get rid of the current index earlier you may reindex some part of the data in this index into the data stream (sort by timestamp to get documents in the correct order) and then start ingesting new data into the data stream.

I can not ask people to send data to other logstream as. It is around 30 teams (and maybe 100 APM indexes. luckly, not all of them are 2.1Tb :smile: )

reindexing by timestamp sounds like a good idea. Thanks!

I was think that maybe it is possible to create second index, make it writable, add it to the same alias(I have created new index template with datastream), make current index read-only..
so, technically, all the new data will go to the newdatastarem .
but somehow it doesn't work (data is not coming to the second index)

I do not think you can resolve this without making changes to how teams index into Elasticsearch.

You can only index into an alias if it points only to a single index, so if you put an alias on multiple indices you will not be able to index into it.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.