We have an index the represents a feed of nft related activities (listing, bid, minted, transfer, sold, etc). The index is used to return realtime feed for users in our system so latency is a priority.
The index is configured with sorting by a timestamp field. We currently have the index setup with 40 shards (one replica) and the size of the index is reaching 3tb (roughly 38gb per shared). I am looking into ILM in order to make sure the index shard size does go above the recommended size but I have a few questions:
- Will having ILM will affect the latency given that each of the indexes in the data stream (hot and warm) will still be queried in order to find matching results (we always sort by timestamp)?
- For warm indexes, i see it recommended not to index data directly into them but we have 2 usecases:
- An nft metadata can change and i that case we might want to update any relevant activities that could be in the warm indexes.
- We do historical backfills, those will be indexed into the hot index but would that affect the sorting performance?
This leads me to believe that ilm is not the right solution for us and we would need to split the indexes based on time range, (i.e. create an index for every month) and then have logic to only query the relevant index based on the requested timestamp.
Is an alternative would be to reindex and increase the shards amount whenever the size reaches a certain point? i see in most articles, the recommendations is to split the index so im assuming this will have an issue at some point (index size, max num of shard, etc).