Reindexing loses documents

With the C# NEST client (version 7.17.5), I am updating an index by copying it to a temporary index, re-creating it, and copying the data back again.

During this process, some documents get lost. For instance, starting with 1 mio. documents, the re-created index will end with 743k documents.

To my understanding, the reindex operation happens too quickly before all the documents are correctly indexed. However, I do have .WaitForCompletion(), so it should not be an issue.

Essentially, I do:

  1. Create the temporary index with updated mappings / settings
  2. Run Reindex original -> temporary
  3. Delete original index
  4. Run Reindex temporary -> original

Why do my documents go missing?

My code:

        var currentIndexName = "test-1";
        var temporaryIndexName = "test-1-reindex";

        // 1. Initialize temporary index with new mapping
        await InitializeIndex(temporaryIndexName);
 
        // 2. Do reindex to temporary mapping
        var reindexResponse = await _elasticClient.ReindexOnServerAsync(r => r
            .Source(s => s
                .Index(currentIndexName)
            )
            .Destination(d => d
                .Index(temporaryIndexName)
            )
            .WaitForCompletion()
        );
        await _elasticClient.Indices.RefreshAsync(temporaryIndexName);
        
        // 3. Delete old index
        await _elasticClient.Indices.DeleteAsync(currentIndexName);
        await _elasticClient.Indices.RefreshAsync(currentIndexName);
        
        // 4. Copy over to new index
        reindexResponse = await _elasticClient.ReindexOnServerAsync(r => r
            .Source(s => s
                .Index(temporaryIndexName)
            )
            .Destination(d => d
                .Index(currentIndexName)
            )
            .WaitForCompletion()
        );

Hi @JanGoAutonomous,

I'm definitely not a .NET expert, and the use of WaitForCompletion does look in line with the examples. Are you initializing the original index again after deletion before triggering the 2nd reindex back to the original in step 4? There is a warning in the docs that the index must exist so I wonder if that's the issue. Can you give that a try if that makes sense?

If not, I would suggest raising a GitHub issue on the client project if you think there is a potential bug so it can be investigated.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.