How to Handle Large Indices when non-time series data

tusharnemade · February 19, 2024, 3:54am

Hello Everyone:

We are using Elasticsearch v7.8.0 and some clusters of version 8.10.4.

We are having Indices storing 40 millions of records in each , having shards -5 primary shards at the time of each index creation.

As we do not have time-series data , We cannot use ROLLOVER index feature , as we are in need if require to update / delete old data of index.

Hence would like to understand , how to manage such huge indices in terms of how to increase primary shards - without downtime as data pushing is continuous.

Anything other feature available in Elasticsearch like ROLLOVER to prevent single index from getting too much big in size.

Currently our each shard is of 35 GB.

Thank You

Christian_Dahlqvist · February 19, 2024, 6:23am

The only mechanisms available in Elasticsearch to increase the number of primary shards is reindexing and the split index API, bith of which requires downtime as new indices are created.

Other solutions depend on the type of data you are inedexing and how you are indexing it. You may be able to add logic to your indexing tier to keep track of which index each document goes to. This way you can create a new index when needed and start routing new documents to this. This can be seamless and be done without downtime, but require changes to your application and ingest pipeline.

tusharnemade · February 19, 2024, 10:25am

Hello @Christian_Dahlqvist

Thank you for your response.

Could you please help me with an example [ URL LINK ] to understand how to maintain the track of which doc goes to which named-index in ES cluster ..

I am very much new to Elasticsearch development side.

Much appreciated your response.

Thank You

Christian_Dahlqvist · February 19, 2024, 10:29am

There is nothing in Elasticsearch that supports that so it is something you will need to build yourself. I suspect how you do that depend on where the data comes from and how you update it. If you for example are storing data related to customers you may add a parameter to each customer outsode of Elasticsearch indicating the table its data is stored in and then use this to send data related to that customer to the correct table(s).

tusharnemade · February 19, 2024, 10:30am

Okay. Understood !

Thank you @Christian_Dahlqvist

system · March 18, 2024, 10:31am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Large shard size Elasticsearch	4	397	December 4, 2021
Managing large indices Elasticsearch	6	1894	October 2, 2022
Elasticsearch 6.x Elasticsearch	3	329	May 13, 2020
How to manage rolling indexes with non-static data Elasticsearch	2	468	March 10, 2017
Automatically increase number of shards Elasticsearch	6	347	November 2, 2022

How to Handle Large Indices when non-time series data

Related topics