Elasticsearch Reindex is slow compare to REALLOCATION

tusharnemade · October 4, 2021, 6:30am

Hello Team

I am using Elasticsearch version 7.8.0.

I have opened this thread for my understanding in regards to Elasticsearch internal operations.

A Cluster Version : 5.6.3.0
B Cluster Version : 7.8.0.0

We are not using Complex Structures / MAPPINGS .. NO PIPLES / INJEST / DYNAMIC MAPPINGS.

We have mapping indexes already created in B Cluster and then performing re-indexing.

We are using TEXT , DATE , NUMBER Data types only.
I have observed :

When I do Reindexing from A Cluster to B Cluster .. for Index of Size PRIMARY -5 SHARDS [ 12.5 GB ] .. It takes 80-90 minutes.

Network is same / subnet is same .. only two different clusters...

If I do RE-ALLOCATION of Shards Within B Cluster
OR
I do increase replicas to 1 , from initial 0 during re-indexing ..

Its super fast ... less than 7-8 mins it completes..

Storage on both Cluster is SAME.

Could you please help me in its understanding..

Christian_Dahlqvist · October 4, 2021, 6:41am

Reindexing indexes all documents again and therfore rebuilds all data structures on disk and in memory. Reallocation on the other hand generally just involves moving aleready computed segment files between nodes which requires a lot less computational work and is a lot more efficient, but can not be done between clusters.

tusharnemade · October 4, 2021, 6:45am

Okay ..

Thanks for the explanation @Christian_Dahlqvist

What can be done to achieve Faster Re-Indexing Between Cluster ..

What I did on target :

Made replica = 0
refresh = -1
translogs [ flush threshold = 2GB ] [ durability as async ]

Any thing else can be done to achieve speed ...

Christian_Dahlqvist · October 4, 2021, 6:53am

You can try slicing the reindex operation in order to improve parallelism.

tusharnemade · October 4, 2021, 6:54am

SLICE Cannot be used with REMOTE Clusters ... during re-indexing operations ...

As per document I read ... this will give me error ...

Christian_Dahlqvist · October 4, 2021, 6:55am

Ah, good point. If you have large data volumes to move you could look into using snapshot and restore instead as that moves segments and do not require reprocessing.

tusharnemade · October 4, 2021, 7:01am

Okay ...

Snapshots = We do not have ... due to our own limitations of INFRA.

So no other Settings at Index / Cluster Level which could boost my re-indexing ... ?

Christian_Dahlqvist · October 4, 2021, 7:19am

Not that I can think of. Just noticed that the difference in version would mean snapshot and restore would not work even if you had shared storage...

system · November 1, 2021, 7:19am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Improve reindex speed into new cluster Elasticsearch	4	1090	January 5, 2019
Improving performance of reindex API? Elasticsearch	7	12146	July 5, 2017
Reindex API performance Elasticsearch	3	4494	July 5, 2017
Elasticsearch Reindexing Benchmark Elasticsearch reindex	2	284	December 1, 2022
Reindex from remote very slow Elasticsearch	1	417	August 10, 2021

Elasticsearch Reindex is slow compare to REALLOCATION

Related topics