Boost 5.x -> 7.x migartion performance for Gbs of data (C# Nest)

I need to migrate existing ES data from version 5.2.2 to 7.6.0.
I do that by reindexing index per index to V6 and then bulk inserting index per index to a new created index and then delete the source index.
I cannot use reindexing for that because I want to modfiy the data on the fly and I don't want to use ES specific update script.

This is my code for reindexing to v6:

  elasticClient.DisableRefreshing(reindexModel.SourceIndex);
            var result = HttpClient.PostAsync(ReIndexUrl, new StringContent(JsonConvert.SerializeObject(reindexModel.Value), Encoding.UTF8, "application/json")).Result;
            var response = result.Content.ReadAsStringAsync().Result;
            var responseObj = JsonConvert.DeserializeObject<ReindexResponse>(response);
            return responseObj.Total;

Note that reindexModel.Value is like this:

{ source = new { index = "sourceIndex" , type = "myDocType", size = 5000}, dest = new { index = targetIndex, type = "_doc" } 

and ReindexResponse.Total is a long.

This is my code for reindexing to v7:

var dataPerIndex = data.Select(
                    item => new BulkIndexOperation<T>(item)
                                {
                                    Index = indexName
                                });

                var allBulksRequest = new BulkRequest
                                          {
                                              Operations = new BulkOperationsCollection<IBulkOperation>(dataPerIndex),
                                              Refresh = Refresh.False
                                          };

                if (allBulksRequest.Operations.Any())
                {
                    var bulkResponse = elasticClient.Bulk(allBulksRequest);
                    bulkResponse.AssertResponseIsValidAndSuccessful();
                    if (bulkResponse.Errors || bulkResponse.ItemsWithErrors.Any())
                    {
                        throw new Exception($"BulkInsert for index: {indexName} failed with errors: {bulkResponse.DebugInformation}");
                    }
                }

Note that size of data is streamed from source index and is chunked to 5000 just like I do set size = 5000 when reindexing to v6.

This works fine but takes quite a while!
In my example with 160 indices with roughly 2GB size total, the process took about 45 minutes!
Not to imaging what happens in case of having like 200GB of data.

Interesting: Increasing value 5000 to like 10000 (which is max value without tweaking) did not improve performance at all.
Any ideas on that?
How do you solve this issue?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.