We are using Elasticsearch v7.8.0 and some clusters of version 8.10.4.
We are having Indices storing 40 millions of records in each , having shards -5 primary shards at the time of each index creation.
As we do not have time-series data , We cannot use ROLLOVER index feature , as we are in need if require to update / delete old data of index.
Hence would like to understand , how to manage such huge indices in terms of how to increase primary shards - without downtime as data pushing is continuous.
Anything other feature available in Elasticsearch like ROLLOVER to prevent single index from getting too much big in size.
The only mechanisms available in Elasticsearch to increase the number of primary shards is reindexing and the split index API, bith of which requires downtime as new indices are created.
Other solutions depend on the type of data you are inedexing and how you are indexing it. You may be able to add logic to your indexing tier to keep track of which index each document goes to. This way you can create a new index when needed and start routing new documents to this. This can be seamless and be done without downtime, but require changes to your application and ingest pipeline.
There is nothing in Elasticsearch that supports that so it is something you will need to build yourself. I suspect how you do that depend on where the data comes from and how you update it. If you for example are storing data related to customers you may add a parameter to each customer outsode of Elasticsearch indicating the table its data is stored in and then use this to send data related to that customer to the correct table(s).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.