How to optimize a reindex operation to perform really fast on big source index

Hi everyone,

I need to reindex a 4.1TB index that was created with very little amount of shards originally (no capacity planning & growth forecast was done)

Nothing is being written currently in the source index and it's currently forced merged into a single segment for better "read/search" performance.

Target index is created from scratch (empty) and both refresh interval & replicas are disabled

I'm using slicing in the reindex tasks, one per shard in the source index (primary & replicas - is this ok ? should it be done only for primary shards ? )

Also batch size is 1000 records (when I set it to 2000 it blows up my cluster the whole thing starts giving error code 500 )

Is there something else that can be done here to make it run faster ? Also, is there a way to use scrolling (specify longer scroll TTL time in the body of reindex for search context )

POST _reindex?wait_for_completion=false&slices=20
{
  "source": {
    "index": "puma.compilation.pipeline.96f19f5b-bc84-4d4b-8694-b80a293e78e4-latest",
    "size": 1000,
    "query": {
"range": {
      "ibi_logtime": {
        "gte": "now-9M/M"
      }
    }
    }
  },
  "dest": {
    "index": "puma.compilation.pipeline.96f19f5b-bc84-4d4b-8694-b80a293e78e4-optimized"
  }
}

Appreciate any feedback that can be provided on this,

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.