Remote Reindex Performance

dawiro · June 4, 2018, 10:27am

Hi,
I need to migrate data between clusters and am looking to do via remindex from remote using curator. However, I'm seeing that the rate of reindexing is very slow. So, how can I up the reindexing rate?

The curator config looks like this:

actions:
  1:
    description: "Reindex Test"
    action: reindex
    options:
      wait_interval: 9
      max_wait: -1
      request_body:
        source:
          remote:
            host: http://elasticsearch-test:9200
          index: test-2018.05.30
        dest:
          index: test-index
    filters:
    - filtertype: none

At the current rate reidexing will take too long to practically useful...

Regards,
D

dawiro · June 4, 2018, 11:03am

I have also seen that the reindex has ended after copying 2.5gb of data. The source index is 77gb in size!

theuntergeek · June 4, 2018, 11:11am

If your source index is 5.x or higher, you can use slices. This is sliced scroll functionality in Elasticsearch, not something Curator-specific.

dawiro · June 4, 2018, 2:12pm

Ok thx, I'll have a play with that

dawiro · June 6, 2018, 8:03am

Hi @theuntergeek,
I've tried running a sliced reindex and get this error:

Exception: TransportError(400, 'action_request_validation_exception', \"Validation Failed: 1: reindex from remote sources doesn't support workers > 1 but was [2]

Config looks like this:

actions:
  1:
    description: "Remote Reindexing Test"
    action: reindex
    options:
      wait_interval: 9
      max_wait: -1
      slices: 2
      request_body:
        source:
          remote:
            host: http://remote-cluster:9200
          index: logs-source
          size: 100
        dest:
          index: logs-out
      remote_filters:
      - filtertype: pattern
        kind: prefix
        value: logs
    filters:
    - filtertype: none

I realise that the remote_filters section is redundant. That's there for when I get to the next stage...

Regards,
D

theuntergeek · June 6, 2018, 1:23pm

Somehow I missed that you were doing a remote reindex (I answered another slow reindex question fairly concurrently, I may have thought this was similar at first glance). Only local reindex can do slices.

You also need to understand that Curator is just an index selection wrapper that makes standard Elasticsearch API calls. You could run this entire command inside Console in Kibana and you would get the exact same result you are seeing with Curator.

What this means is that the slowness can be accounted for by:

Network latency/speed
The performance of the remote cluster
The performance of the local cluster
The shard count of the target index (higher shard counts can increase indexing speed)

If you need reindex from remote to be faster, these are about the only ways you can accommodate to speed things up. Another means of transferring data from a remote to a local cluster would be snapshot/restore, where both clusters have access to the same network storage system (S3, GCP, Azure, etc.).

system · July 4, 2018, 1:36pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Why is reindex from remote constantly slowing down on large indices? Elasticsearch reindex	2	624	December 31, 2020
Reindex from remote very slow Elasticsearch	1	417	August 10, 2021
ES 5.1.1- Using slices with the reindex from remote Elasticsearch	6	1888	January 25, 2017
Improving performance of reindex API? Elasticsearch	7	12146	July 5, 2017
Reindex API performance Elasticsearch	3	4494	July 5, 2017

Remote Reindex Performance

Related topics