Reindex from remote very slow

Hi,

I am trying to reindex from remote an index of ~40Gb, single shard, single node, to an index of 8 shards, 3 nodes. The operation is very slow (now a couple of days) and seems to only progress during specific hours of the day: when querying the reindexing task status, I see that the number of documents copied doesn't change, expect at certain point of the day. I checked the cloudwatch metrics of the master node and saw that indeed, there are periodic peaks of Bandwitdth/Throughput every day, corresponding to when the reindexing task makes some progress:

I guess something else is happening during the rest of the day, but I don't know what/why (merging? syncing between shards/nodes?). I can see that all the 8 shards roughly have the same size (growing slowly).

I've set number of replicas to 0, resfresh interval to -1. Here is the _stats of the reindex node (the destination): { "_shards" : { "total" : 8, "successful" : 8, "failed" : 0 - Pastebin.com

The 3 nodes are c5a.large instances (4Gb of RAM), the ES heap size is 2Gb. The disks are GP3 SSD with 10000 IOPS and 500MB/s of throughput. ES version is 7.6.2

I have the following questions:

  • What can be the reason of this slow re-indexing? How can I speed it up?
  • Is it safe to use the new index instead of the old one while the re-index operation is running? (To write new documents and read current ones)

Thanks a lot for your help!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.