[Captain Savage] [25] Failed to execute query phase
org.elasticsearch.transport.RemoteTransportException:
[index4.qa.acx][inet[/192.168.34.34:9300]][search/phase/scan/scroll]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search
context found for id [25]
at
org.elasticsearch.search.SearchService.findContext(SearchService.java:389)
at
org.elasticsearch.search.SearchService.executeScan(SearchService.java:202)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:591)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:582)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:238)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Now, it doesn't affect the outcome at all, seems benign, and it is DEBUG
after all, but hey, I just like to know what this is really trying to say.
Paranoid? Yeah, probably.
You should not get this failure when scanning, are you sure the results you
get back are good? It basically means that the context kept on each shard to
support scanning is missing, which can happen for several reasons. The first
is that it timed out, second is that it was exhausted, but somewhere it is
still being asked for, which can happen if you don't use the returned scroll
id from the previous request.
While issuing a _scan call, I see these stack traces logged at DEBUG level
on my client:
5843 [New I/O client worker #1-2] DEBUG
org.elasticsearch.action.search.type - [Captain Savage] [25] Failed to
execute query phase
org.elasticsearch.transport.RemoteTransportException:
[index4.qa.acx][inet[/192.168.34.34:9300]][search/phase/scan/scroll]
Caused by: org.elasticsearch.search.SearchContextMissingException: No
search context found for id [25]
at
org.elasticsearch.search.SearchService.findContext(SearchService.java:389)
at
org.elasticsearch.search.SearchService.executeScan(SearchService.java:202)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:591)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:582)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:238)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Now, it doesn't affect the outcome at all, seems benign, and it is DEBUG
after all, but hey, I just like to know what this is really trying to say.
Paranoid? Yeah, probably.
You should not get this failure when scanning, are you sure the results you
get back are good? It basically means that the context kept on each shard to
support scanning is missing, which can happen for several reasons. The first
is that it timed out, second is that it was exhausted, but somewhere it is
still being asked for, which can happen if you don't use the returned scroll
id from the previous request.
We're configuring the Scroll timeout to be 10 minutes, and it looks to me
that we're configuring the scrollId in the subsequent batch requests.
This error does NOT happen when we run 'embedded' test modes, only when we
run it against a node external to this process (although the ES node could
still be running on the same box).
You should not get this failure when scanning, are you sure the results
you get back are good? It basically means that the context kept on each
shard to support scanning is missing, which can happen for several reasons.
The first is that it timed out, second is that it was exhausted, but
somewhere it is still being asked for, which can happen if you don't use the
returned scroll id from the previous request.
Heya, quickly looked at the code, and I see the problem. The consumeBatches
should feed the next search request with the previous search request scroll
id. It seems like you use the initial scroll id returned from the first scan
request. Things will still work when you don't do it, but you will see the
mentioned failures (and its better to fix it).
You should not get this failure when scanning, are you sure the results
you get back are good? It basically means that the context kept on each
shard to support scanning is missing, which can happen for several reasons.
The first is that it timed out, second is that it was exhausted, but
somewhere it is still being asked for, which can happen if you don't use the
returned scroll id from the previous request.
Heya, quickly looked at the code, and I see the problem. The consumeBatches
should feed the next search request with the previous search request scroll
id. It seems like you use the initial scroll id returned from the first scan
request. Things will still work when you don't do it, but you will see the
mentioned failures (and its better to fix it).
Um, ok, that's odd. I had thought the pattern was:
Initiate a _scan
get the scrollId from the scan
call prepareSearchScroll with this scrollId
continue to use prepareSearchScroll to iterate over the batches passing in
the original scrollId?
Am I reading you right in saying that the first scrollId (scan) is only used
on the first prepareSearchScroll call, and then we're expected to get a new scrollId from the 1st iteration to use for all the other batches?
Not sure what I am missing in the way I explain it. You need to provide the
previous response scroll id to the new scroll request you are making, as it
can change through calls. There is an example for it on the Java API page: Elasticsearch Platform — Find real-time answers at scale | Elastic.
Heya, quickly looked at the code, and I see the problem. The
consumeBatches should feed the next search request with the previous search
request scroll id. It seems like you use the initial scroll id returned from
the first scan request. Things will still work when you don't do it, but you
will see the mentioned failures (and its better to fix it).
Um, ok, that's odd. I had thought the pattern was:
Initiate a _scan
get the scrollId from the scan
call prepareSearchScroll with this scrollId
continue to use prepareSearchScroll to iterate over the batches passing
in the original scrollId?
Am I reading you right in saying that the first scrollId (scan) is only
used on the first prepareSearchScroll call, and then we're expected to get
a new scrollId from the 1st iteration to use for all the other batches?
Not sure what I am missing in the way I explain it. You need to provide the
previous response scroll id to the new scroll request you are making, as it
can change through calls. There is an example for it on the Java API page: Elasticsearch Platform — Find real-time answers at scale | Elastic.
ok you are right, I overlooked that, it's not what I had a mental model of.
I was thinking more like a SQL cursor where the initial query creates the
cursor, and then the same handle is used throughout.
Yea, the reason why it behaves like that is because the scroll id actually
holds information per shard, and also, which shards are needed to continue
to be scrolled, and which don't. That changes as you scroll through the
results.
Not sure what I am missing in the way I explain it. You need to provide
the previous response scroll id to the new scroll request you are making, as
it can change through calls. There is an example for it on the Java API
page: Elasticsearch Platform — Find real-time answers at scale | Elastic.
ok you are right, I overlooked that, it's not what I had a mental model of.
I was thinking more like a SQL cursor where the initial query creates the
cursor, and then the same handle is used throughout.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.