Hi,
I'm experiencing a weird behaviour while scrolling elasticsearch (v5.5.2).
I have an application that performs multiple search/scrolls hourly for analytics purposes and sometimes the query/scroll fails on some shards.
Here's an example of the output I consider invalid:
{
"_scroll_id": "<a scroll id>",
"took": 153057,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 2,
"failed": 1,
"failures": [
{
"shard": 17,
"index": "<my index>",
"node": "<node id>",
"reason": {
"type": "search_context_missing_exception",
"reason": "No search context found for id [3122061]"
}
}
]
}
}
This happens with different queries and other shards fail randomly as well. This same query/scroll was performed successfully some time before and also later.
Anyone has any idea about why elasticsearch would fail like this?
I know that one of the possible reasons of this error No search context found for id [...]
is probably because the scroll expired on that shard, but I guess that doesn't make a lot of sense, since the other shards returned the results successfully?
Unfortunately I couldn't find a way to reproduce this issue.
The way I'm protecting the app to process "bad data" is by checking if the returned _shards.failed
is 0 and _shards.total
is equal to _shards.successful
.
Thanks in advance for any thoughts on this.