Reindexing in Production Environment

Hi Everyone, Currently in our Elasticsearch cluster we have a lot of documents which need to deleted, so we are looking to re-index the used documents to a new index and delete the old index. We will be doing this in production environment, the approach is to create a new index, re-index the used documents to a new index, once done we will point the alias to new index to avoid any downtime. Questions for this approach:

  1. What configuration can we use during re-indexing to not cause any latency issues to the queries during this task? (like scorll_size, requests_per_second etc)

  2. In order to speed up the re-indexing we are thinking about stopping refresh on the new index, will there be any issues when ingest a large amount of documents and turn on the refresh later.

  3. Will there be a increase in CPU Utilization or query latency issues when we delete a large index?

Any suggestions are welcome.

Thanks.

Hi @vishnu_teja! Good questions.

By default when you start the reindex it will run as fast as it can. If you start to see search latency creep up or other performance problems you can always dial it back by setting requests_per_second.

Yes, you should do this, and no there won't be any issues when turned on later. Just set it back to the default and you should be good to go. Also, look at turning off replicas when reindexing, and then turn it back on when done. Once the index goes green you can delete the original index.
That brings us to your last question:

No, that shouldn't happen.

Good luck, and happy reindexing.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.