Reindexing from Remote Cluster Fails

I have two clusters running Elasticsearch & Kibana 7.3. I've been able to reindex documents to the same cluster without issue in the past, however when attempting to Reindex from Remote, it's been reindexing much more slowly and consistently produces errors related to search query:

    "failures" : [
      {
        "shard" : -1,
        "reason" : {
          "type" : "es_rejected_execution_exception",
          "reason" : "rejected execution of org.elasticsearch.common.util.concurrent.TimedRunnable@5ef3f878 on QueueResizingEsThreadPoolExecutor[name = remote-data-node-1/search, queue capacity = 1000, min queue capacity = 1000, max queue capacity = 1000, frame size = 2000, targeted response rate = 1s, task execution EWMA = 40.8ms, adjustment amount = 50, org.elasticsearch.common.util.concurrent.QueueResizingEsThreadPoolExecutor@445cb597[Running, pool size = 49, active threads = 49, queued tasks = 1021, completed tasks = 2445041]]"
        }
      },
...

The remote cluster is about 70ms away, and currently not being searched at all, aside from this.
The reindex is not throttled: "requests_per_second" : -1.0, the entire reindex request is:

POST _reindex?wait_for_completion=false
{
  "source": {
    "remote": {
      "host": "http://remote-coordinating-node:9200"
    },
    "index": "flow-*",
    "query": {
      "bool": {
        "must": [
          {
            "range": {
              "@timestamp": {
                "format": "strict_date_optional_time",
                "gte": "2019-09-30T23:00:00.000Z",
                "lte": "2019-10-01T23:00:00.000Z"
              }
            }
          }
        ],
        "filter": [
          {
            "match_all": {}
          }
        ],
        "should": [],
        "must_not": []
      }
    }
  },
  "dest": {
    "index": "flow_remote_old"
  }
}

Do I need to just implement throttling? I imagine it's this (seemingly lightweight) query that's killing my searches:

            "range": {
              "@timestamp": {
                "format": "strict_date_optional_time",
                "gte": "2019-09-30T23:00:00.000Z",
                "lte": "2019-10-01T23:00:00.000Z"
              }
            }

Edit: this error still occurs even without the query - e.g. attempting to copy one index over:

POST _reindex?wait_for_completion=false
{
  "source": {
    "remote": {
      "host": "http://remote-coordinating-node:9200"
    },
    "index": "flow-prod-000045",
    "query": {
      "bool": {
        "must": [],
        "filter": [
          {
            "match_all": {}
          }
        ],
        "should": [],
        "must_not": []
      }
    }
  },
  "dest": {
    "index": "flow_remote_old"
  }
}

Any help would be appreciated.

I've tweaked a couple settings yet still am getting timeouts consistently. I currently can't reindex any docs from a remote cluster.

POST _reindex?wait_for_completion=false&requests_per_second=500
{
  "conflicts": "proceed",
  "source": {
    "size":  100,
    "remote": {
      "host": "http://remote-coordinating-node:9400"
    },
    "index": "flow-prod-000043",
    "query": {
      "bool": {
        "must": [],
        "filter": [
          {
            "match_all": {}
          }
        ],
        "should": [],
        "must_not": []
      }
    }
  },
  "dest": {
    "index": "flow_remote_old",
    "op_type": "create"
  }
}

Found the error. Some go code would run on a cron every hour to refresh some logstash dictionaries. That code was not thread-limited and would create a spike of ~3k searches per second. This of course, as the error says, would stop our reindexing. Reviewing the Search Rate was a dead-giveaway that something funky was going on.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.