Scroll randomly failing on some shards

Hi,

I'm experiencing a weird behaviour while scrolling elasticsearch (v5.5.2).

I have an application that performs multiple search/scrolls hourly for analytics purposes and sometimes the query/scroll fails on some shards.

Here's an example of the output I consider invalid:

{
  "_scroll_id": "<a scroll id>",
  "took": 153057,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 2,
    "failed": 1,
    "failures": [
      {
        "shard": 17,
        "index": "<my index>",
        "node": "<node id>",
        "reason": {
          "type": "search_context_missing_exception",
          "reason": "No search context found for id [3122061]"
        }
      }
    ]
  }
}

This happens with different queries and other shards fail randomly as well. This same query/scroll was performed successfully some time before and also later.

Anyone has any idea about why elasticsearch would fail like this?

I know that one of the possible reasons of this error No search context found for id [...] is probably because the scroll expired on that shard, but I guess that doesn't make a lot of sense, since the other shards returned the results successfully?

Unfortunately I couldn't find a way to reproduce this issue.

The way I'm protecting the app to process "bad data" is by checking if the returned _shards.failed is 0 and _shards.total is equal to _shards.successful.

Thanks in advance for any thoughts on this.

3 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.