Elasticsearch 6.3.0 Scroll API requests got cancelled by backend

After some time on one of my clusters (others have identical configuration and type of data/actions but do not have following problem) I detect some strange cluster behavior:

[2019-01-24T05:32:01,537][DEBUG][o.e.a.a.c.n.t.c.TransportCancelTasksAction] [c02] Received ban for the parent [zXSiTjd1Q1G9AAPo6WEmJg:1741023573] on the node [zXSiTjd1Q1G9AAPo6WEmJg], reason: [by user request]
[2019-01-24T05:32:01,538][DEBUG][o.e.a.a.c.n.t.c.TransportCancelTasksAction] [c02] Sending remove ban for tasks with the parent [zXSiTjd1Q1G9AAPo6WEmJg:1741023573] to the node [w30c6uTFTAOYWdoCwtiIyw]
[2019-01-24T05:32:01,538][DEBUG][o.e.a.a.c.n.t.c.TransportCancelTasksAction] [c02] Sending remove ban for tasks with the parent [zXSiTjd1Q1G9AAPo6WEmJg:1741023568] to the node [saVc3cP7QA6JTbb_vjZZ9g]
[2019-01-24T05:32:01,538][DEBUG][o.e.a.a.c.n.t.c.TransportCancelTasksAction] [c02] Sending remove ban for tasks with the parent [zXSiTjd1Q1G9AAPo6WEmJg:1741023573] to the node [saVc3cP7QA6JTbb_vjZZ9g]
[2019-01-24T05:32:01,538][DEBUG][o.e.a.a.c.n.t.c.TransportCancelTasksAction] [c02] Sending remove ban for tasks with the parent [zXSiTjd1Q1G9AAPo6WEmJg:1741023573] to the node [zXSiTjd1Q1G9AAPo6WEmJg]
[2019-01-24T05:32:01,538][DEBUG][o.e.a.a.c.n.t.c.TransportCancelTasksAction] [c02] Removing ban for the parent [zXSiTjd1Q1G9AAPo6WEmJg:1741023573] on the node [zXSiTjd1Q1G9AAPo6WEmJg]

I come to that case by trying to understand why some Scroll API daily queries got failed, example of request:
GET xxx.2019-01/_search?scroll=5m

{
  "sort": [
    "_doc"
  ],
  "size": 10000,
  "_source": [
    "field_1",
    "field_n"
  ],
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "should": [
              {
                "match_phrase": {
                  "field_1": "a"
                }
              },
              {
                "match_phrase": {
                  "field_1": "b"
                }
              }
            ],
            "minimum_should_match": 1
          }
        },
        {
          "bool": {
            "should": [
              {
                "match_phrase": {
                  "field_n": {
                    "query": "XXX"
                  }
                }
              }
            ],
            "minimum_should_match": 1
          }
        }
      ]
    }
  }
}

Response:

{
  "_scroll_id": "DnF1ZXJ5VGhlbkZldGNoAgAAAAAIJdMoFnpYU2lUamQxUTFHOUFBUG82V0VtSmcAAAAABUsP_BZoYmFteWN3cFN1MlRfdVZjaFBubnBR",
  "took": 348,
  "timed_out": false,
  "_shards": {
    "total": 2,
    "successful": 1,
    "skipped": 0,
    "failed": 1,
    "failures": [
      {
        "shard": 0,
        "index": "xxx.2019-01",
        "node": "hbamycwpSu2T_uVchPnnpQ",
        "reason": {
          "type": "task_cancelled_exception",
          "reason": "cancelled"
        }
      }
    ]
  },
  "hits": {
    "total": 8311,
    "max_score": null,
    "hits": []
  }
}

so by some, so far unknown, reason(s) Elasticsearch start to cancel such requests from time to time, additionally here is cluster parameters that I use besides defaults:
thread_pool.search.queue_size: 5000

I would greatly appreciate if someone could help me to understand the reasons for such behavior (why "ban" happens) and possible solutions (besides request retry)

Thanks.

1 Like

resolved.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.