Slow reindex operation on heavy index

Hi all,

How can I speed up a long running reindex operation ?

This is being done from a source index of around 4.2TB with 16 shards each of around 300GB~ in a 10 data nodes clusters.

Target index is 90 shards. I've set # of replicas to 0 and refresh rate to -1 to try to speed things up. BUT at this point it has only indexed 1GB in the last 3 hours, which is very slow.

Here the monitoring statistics:

What else can be done to speed this up ?

Regards,

Have you tried slicing the reindex operation?

1 Like

This is the solution. Thanks! It helped indeed. I also merged the source index in a single segment as I don't expect any further writes to it anytime soon. Also disabled all type of shard allocation throughout the cluster and now my reindex is avg ~15,000 docs/sec which is the best historical indexing rate I've ever had in this cluster :slight_smile:

POST _reindex?wait_for_completion=false&slices=20&refresh
{
  "source": {
    "index": "puma.compilation.pipeline.96f19f5b-bc84-4d4b-8694-b80a293e78e4-latest",
    "size": 500,
    "query": {
"range": {
      "ibi_logtime": {
        "gte": "now-9M/M"
      }
    }
    }
  },
  "dest": {
    "index": "puma.compilation.pipeline.96f19f5b-bc84-4d4b-8694-b80a293e78e4-optimized"
  }
}

A disclaimer here, since my reindex operations take too long, I would not recommend anybody to disable allocations at cluster level if there new indices being created in the cluster (it would cause red state)

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.