Reindexing throughput degrades over time

htang · February 24, 2021, 6:28am

My reindex operation has slowed considerably over time - what can explain this trend of decrease in throughput? Please see attached.

While reindexing is happening, the search latency has also gone up too. See attached

CPU utilization is fairly constant

Other information:

Total number of documents to index is about 300 million.
Index is configured with 15 shards
Total number of data nodes is 3
Refresh interval is set to 10s
Replica count is set to 2

Here is the Reindex request:

        ReindexRequest request = new ReindexRequest();
        request.setSourceIndices("sourceIndex");
        request.setDestIndex("destIndex");
        request.setDestVersionType(VersionType.EXTERNAL);
        request.setDestOpType("index");
        request.setConflicts("proceed");
        request.setScript(new Script(ScriptType.INLINE, "painless", "ctx._source.sortId = ctx._id", Collections.emptyMap()));
        request.setRefresh(true);

Christian_Dahlqvist · February 24, 2021, 7:54am

I would recommend looking at disk I/O and iowait as this very well could be the bottleneck. I believe reindexing retains the original document ID and this means each indexing operation actually is an update (read and write) as Elasticsearch need to check if the document already exists. This is a lot more expensive than just indexing a new document and tend to get slower the larger the index that is written to gets. If disk performance seems likely to limit performance I would recommend temporarily setting the number of replicas to 0 for the destination index as that will reduce the disk I/O. Also check in the logs whether there is anything around long or frequent GC.

system · March 24, 2021, 7:54am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Improving performance of reindex API? Elasticsearch	7	12068	July 5, 2017
Reindex API: parallel reindex requests, reindexing while still indexing to source indices Elasticsearch	5	1826	September 1, 2017
Reindex from remote very slow Elasticsearch	1	417	August 10, 2021
Reindex API is extremely slow Elasticsearch reindex	2	880	June 24, 2021
Reindex GC overhead Elasticsearch	1	418	April 3, 2018

Reindexing throughput degrades over time

Related topics