We are currently encountering challenges with our Elasticsearch setup, where we utilize a single alias across multiple indices (365 daily indices). Two key issues have come to our attention:
- Aggregation Performance: Aggregating data across these multiple indices is not optimal. Each index, with an average size of 250MB, results in inefficiency due to the shard number of indices (365). This is negatively impacting aggregation performance.
- Mapping File Updates: When our mapping file changes (typically involving synonym additions), all 365 indices are affected, causing significant time overhead.
As a potential solution, we are contemplating consolidating the data into a single index covering 365 days. However, managing this presents a challenge as we need to regularly remove the oldest data and add the newest data.
In our current version, handling this involves deleting an index and creating a new one and connect it to the current alias. Yet, the new approach necessitates daily operations of deleting some documents and adding new ones. We've heard that delete_by_query
and update_by_query
are resource-intensive tasks, consuming significant time.
Are there more efficient strategies or Elasticsearch functionalities to address these challenges?