Significant query performance degradation on index-a while bulk updates are occuring on index-b

We have one es cluster with two indexes e.g. index-a and index-b and two aliases e.g. read and write
(Let's assume read -> index-a and write -> index-b)

All queries executed on the 'read' alias

All bulk updates are executed on the 'write' alias

Once per day, a job runs bulk updates on the write alias and touches every doc in the index (6million docs). The job sets the refresh_interval to -1 and runs bulk updates in batches of 500 until all 6million is updated. The job takes ~45mins to complete. Upon completion, the read and write aliases are swapped and the refresh_interval is set to 5m. While the job is running, the performance of queries on the read alias (not the index being written to) suffers significantly.

Why this process: We chose to write to a separate index to ensure the read index never undergoes an extensive period of bulk writes. We thought this would enable query performance to remain fairly consistent during the write period since the index serving queries is not being written to. Unfortunately, while those bulks writes occur on the write index, we see significant performance degradation on the read index.

Any ideas why query performance suffers so greatly and how we might remove/mitigate the disruption writes have on the index taking only reads.

As the indices are held in the same cluster thay do share cluster resources like CPU and disk I/O. The bulk indexing can also affect the operating system page cache, which in turn can affect query performance. How much they affect eachother depends a lot on the size of your indices, how large portion of them that can be cached and how heavily your bulk indexing loads the system.

To completely avoid this you could add a separate node to the cluster for indexing and use shard allocation filtering to only allocate the index you are building to this node. That will separate resources and once indexing has completed you can change the settings on the index and move it to the search oriented part of the cluster.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.