Reindex API: parallel reindex requests, reindexing while still indexing to source indices

mpereira · July 7, 2017, 8:45pm

We upgraded ES from 2 to 5 and want to do a full reindex so that our indices are upgradable to ES6.

Given clusters running on AWS D2 nodes each with hundreds of +100GB indices configured to have 1 primary and 0 replicas, what would be an optimal reindexing strategy?

Nodes don't have a lot of extra disk space (~80% full).

What we're currently thinking is reindexing/deleting 1 index at a time with wait_for_completion=true but initial tests show that this takes a long time. We're seeing average throughput of 4.5MBps.

Would it make sense to drop wait_for_completion=true and let the cluster parallelize reindex tasks? Would the cluster retry reindexes that failed due to a temporary lack of disk space?

Does it parallelize wait_for_completion=false reindex requests without specifying slicing?

What happens when reindexing from indices which are still being written to? Would the destination index only get documents that were in the source at the point in time when the reindex request was sent?

mpereira · July 10, 2017, 3:53pm

Would you expect a reindex on a single-node cluster to have an average throughput of 4MB/s given these node IO metrics?

$ mount | grep elasticsearch                                                                                 
/dev/mapper/ephemeral-elasticsearch on /mnt/elasticsearch type xfs (rw)

$ df -h | grep elasticsearch
/dev/mapper/ephemeral-elasticsearch                    5.4T  4.8T  678G  88% /mnt/elasticsearch

$ sudo hdparm -Tt /dev/mapper/ephemeral-elasticsearch 
/dev/mapper/ephemeral-elasticsearch:
 Timing cached reads:   18780 MB in  2.00 seconds = 9400.56 MB/sec
 Timing buffered disk reads: 920 MB in  3.01 seconds = 306.07 MB/sec

$ sudo dd if=/dev/zero of=/mnt/elasticsearch/output bs=8k count=100k; sudo rm -f /mnt/elasticsearch/output
102400+0 records in
102400+0 records out
838860800 bytes (839 MB) copied, 2.17104 s, 386 MB/s

mpereira · July 10, 2017, 7:47pm

Answer to this seems to be yes after experiments.

mpereira · July 12, 2017, 5:29pm

I hate to bump this, but could someone at least opine on whether this throughput rate is normal? Do you need more information?

dakrone · August 4, 2017, 12:50am

@nik9000 this sounds like something you might be able to help with?

system · September 1, 2017, 12:50am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Reindex API performance Elasticsearch	3	4494	July 5, 2017
Improving performance of reindex API? Elasticsearch	7	12146	July 5, 2017
Improving Reindex Performance in v5.6 Elasticsearch	8	660	January 18, 2019
Improve reindex speed into new cluster Elasticsearch	4	1090	January 5, 2019
Reindexing throughput degrades over time Elasticsearch reindex	2	463	March 24, 2021

Reindex API: parallel reindex requests, reindexing while still indexing to source indices

Related topics