Hi! I have few indices where all the data is equally relevant no matter how old it is. Over time these indices will increase in size (from 5-10MB to 1000GB). I want to optimize the number of shards in these indices, and increase them as the amount grows.
I've looked into ILM, but did not seem that any of the options there would suit this purpose. Does anyone have any other suggestions on how to solve this in a smooth way?
The data is information about photos (all the metadata and other information needed for each image). They should all be as easily accessed and updated, no matter how old the data is.
(The actual files are stored elsewhere)
You can not change the number of primary shards of an index once it has been created. You can however use the split index API to create a new index with a greater number of primary shards. If you are querying through an alias you can have it point at the old index until the new one is ready and they flip it. The issue here is that you will need to pause any updates, inserts and deletes during the time the index is being split. This allows you to continue using a single index, which is convenient when you update data.
Using time-based indices using ILM and rollover means that you are always indexing into a single index and that this will change over time. You can easily query all indices but updates become more expensive as you first need to find in which index the document you are to update resides before you actually perform the update. The rollover feature will allow you to generate new indices of a spacific target size over time and may be an option if you do not perform a lot of updates or deletes and can take the extra cost. ILM also supports different lifecycle stages, but that is not really applicable to you as all data is equelly relevant.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.