Optimizing Performance and Managing Daily Data Updates in Elasticsearch with Multiple Indices

We are currently encountering challenges with our Elasticsearch setup, where we utilize a single alias across multiple indices (365 daily indices). Two key issues have come to our attention:

  1. Aggregation Performance: Aggregating data across these multiple indices is not optimal. Each index, with an average size of 250MB, results in inefficiency due to the shard number of indices (365). This is negatively impacting aggregation performance.
  2. Mapping File Updates: When our mapping file changes (typically involving synonym additions), all 365 indices are affected, causing significant time overhead.

As a potential solution, we are contemplating consolidating the data into a single index covering 365 days. However, managing this presents a challenge as we need to regularly remove the oldest data and add the newest data.

In our current version, handling this involves deleting an index and creating a new one and connect it to the current alias. Yet, the new approach necessitates daily operations of deleting some documents and adding new ones. We've heard that delete_by_query and update_by_query are resource-intensive tasks, consuming significant time.

Are there more efficient strategies or Elasticsearch functionalities to address these challenges?

Why not go halfway and just switch to monthly indices and delete data by deleting indices? This means that you may hold on to some of the data a bit longer than you do at the moment but should not result in very large shards.

That sounds like a good idea. It seems that I need to modify the current backend code, so I hadn't considered that! (The current code is configured to search and aggregate over the entire date range using aliases.)

It looks like we can follow the structure below:

  • Create monthly indices
  • Once a monthly data is fulled, delete the oldest monthly index.

However, it seems that we'll need to keep inserting daily data into the most recent monthly index. Doing this, perhaps through a method like 'update_by_query', couldn't be a heavy task?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.