Remote Reindex Performance

Hi,
I need to migrate data between clusters and am looking to do via remindex from remote using curator. However, I'm seeing that the rate of reindexing is very slow. So, how can I up the reindexing rate?

The curator config looks like this:

actions:
  1:
    description: "Reindex Test"
    action: reindex
    options:
      wait_interval: 9
      max_wait: -1
      request_body:
        source:
          remote:
            host: http://elasticsearch-test:9200
          index: test-2018.05.30
        dest:
          index: test-index
    filters:
    - filtertype: none

At the current rate reidexing will take too long to practically useful...

Regards,
D

I have also seen that the reindex has ended after copying 2.5gb of data. The source index is 77gb in size!

If your source index is 5.x or higher, you can use slices. This is sliced scroll functionality in Elasticsearch, not something Curator-specific.

Ok thx, I'll have a play with that :slight_smile:

Hi @theuntergeek,
I've tried running a sliced reindex and get this error:

Exception: TransportError(400, 'action_request_validation_exception', \"Validation Failed: 1: reindex from remote sources doesn't support workers > 1 but was [2]

Config looks like this:

actions:
  1:
    description: "Remote Reindexing Test"
    action: reindex
    options:
      wait_interval: 9
      max_wait: -1
      slices: 2
      request_body:
        source:
          remote:
            host: http://remote-cluster:9200
          index: logs-source
          size: 100
        dest:
          index: logs-out
      remote_filters:
      - filtertype: pattern
        kind: prefix
        value: logs
    filters:
    - filtertype: none

I realise that the remote_filters section is redundant. That's there for when I get to the next stage...

Regards,
D

Somehow I missed that you were doing a remote reindex (I answered another slow reindex question fairly concurrently, I may have thought this was similar at first glance). Only local reindex can do slices.

You also need to understand that Curator is just an index selection wrapper that makes standard Elasticsearch API calls. You could run this entire command inside Console in Kibana and you would get the exact same result you are seeing with Curator.

What this means is that the slowness can be accounted for by:

  1. Network latency/speed
  2. The performance of the remote cluster
  3. The performance of the local cluster
  4. The shard count of the target index (higher shard counts can increase indexing speed)

If you need reindex from remote to be faster, these are about the only ways you can accommodate to speed things up. Another means of transferring data from a remote to a local cluster would be snapshot/restore, where both clusters have access to the same network storage system (S3, GCP, Azure, etc.).

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.