Elasticsearch Reindex is slow compare to REALLOCATION

Hello Team

I am using Elasticsearch version 7.8.0.

I have opened this thread for my understanding in regards to Elasticsearch internal operations.

A Cluster Version : 5.6.3.0
B Cluster Version : 7.8.0.0

We are not using Complex Structures / MAPPINGS .. NO PIPLES / INJEST / DYNAMIC MAPPINGS.

We have mapping indexes already created in B Cluster and then performing re-indexing.

We are using TEXT , DATE , NUMBER Data types only.
I have observed :

  1. When I do Reindexing from A Cluster to B Cluster .. for Index of Size PRIMARY -5 SHARDS [ 12.5 GB ] .. It takes 80-90 minutes.

Network is same / subnet is same .. only two different clusters...

  1. If I do RE-ALLOCATION of Shards Within B Cluster
    OR
    I do increase replicas to 1 , from initial 0 during re-indexing ..

Its super fast ... less than 7-8 mins it completes..

Storage on both Cluster is SAME.

Could you please help me in its understanding..

Reindexing indexes all documents again and therfore rebuilds all data structures on disk and in memory. Reallocation on the other hand generally just involves moving aleready computed segment files between nodes which requires a lot less computational work and is a lot more efficient, but can not be done between clusters.

Okay ..

Thanks for the explanation @Christian_Dahlqvist

What can be done to achieve Faster Re-Indexing Between Cluster ..

What I did on target :

  1. Made replica = 0
  2. refresh = -1
  3. translogs [ flush threshold = 2GB ] [ durability as async ]

Any thing else can be done to achieve speed ...

You can try slicing the reindex operation in order to improve parallelism.

SLICE Cannot be used with REMOTE Clusters ... during re-indexing operations ...

As per document I read ... this will give me error ...

Ah, good point. If you have large data volumes to move you could look into using snapshot and restore instead as that moves segments and do not require reprocessing.

Okay ...

Snapshots = We do not have ... due to our own limitations of INFRA.

So no other Settings at Index / Cluster Level which could boost my re-indexing ... ?

Not that I can think of. Just noticed that the difference in version would mean snapshot and restore would not work even if you had shared storage...

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.