DEBUG level error question

Paul_Smith · October 25, 2011, 9:19am

While issuing a _scan call, I see these stack traces logged at DEBUG level
on my client:

5843 [New I/O client worker #1-2] DEBUG org.elasticsearch.action.search.type

[Captain Savage] [25] Failed to execute query phase
org.elasticsearch.transport.RemoteTransportException:
[index4.qa.acx][inet[/192.168.34.34:9300]][search/phase/scan/scroll]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search
context found for id [25]
at
org.elasticsearch.search.SearchService.findContext(SearchService.java:389)
at
org.elasticsearch.search.SearchService.executeScan(SearchService.java:202)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:591)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:582)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:238)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

Now, it doesn't affect the outcome at all, seems benign, and it is DEBUG
after all, but hey, I just like to know what this is really trying to say.
Paranoid? Yeah, probably.

Paul

kimchy · October 26, 2011, 2:54am

You should not get this failure when scanning, are you sure the results you
get back are good? It basically means that the context kept on each shard to
support scanning is missing, which can happen for several reasons. The first
is that it timed out, second is that it was exhausted, but somewhere it is
still being asked for, which can happen if you don't use the returned scroll
id from the previous request.

On Tue, Oct 25, 2011 at 11:19 AM, Paul Smith tallpsmith@gmail.com wrote:

While issuing a _scan call, I see these stack traces logged at DEBUG level
on my client:

5843 [New I/O client worker #1-2] DEBUG
org.elasticsearch.action.search.type - [Captain Savage] [25] Failed to
execute query phase
org.elasticsearch.transport.RemoteTransportException:
[index4.qa.acx][inet[/192.168.34.34:9300]][search/phase/scan/scroll]
Caused by: org.elasticsearch.search.SearchContextMissingException: No
search context found for id [25]
at
org.elasticsearch.search.SearchService.findContext(SearchService.java:389)
at
org.elasticsearch.search.SearchService.executeScan(SearchService.java:202)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:591)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:582)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:238)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

Now, it doesn't affect the outcome at all, seems benign, and it is DEBUG
after all, but hey, I just like to know what this is really trying to say.
Paranoid? Yeah, probably.

Paul

Paul_Smith · October 26, 2011, 3:15am

On 26 October 2011 13:54, Shay Banon kimchy@gmail.com wrote:

You should not get this failure when scanning, are you sure the results you
get back are good? It basically means that the context kept on each shard to
support scanning is missing, which can happen for several reasons. The first
is that it timed out, second is that it was exhausted, but somewhere it is
still being asked for, which can happen if you don't use the returned scroll
id from the previous request.

Here's the relevant class doing the Scroll:

https://github.com/Aconex/scrutineer/blob/master/src/main/java/com/aconex/scrutineer/elasticsearch/ElasticSearchDownloader.java

We're configuring the Scroll timeout to be 10 minutes, and it looks to me
that we're configuring the scrollId in the subsequent batch requests.

This error does NOT happen when we run 'embedded' test modes, only when we
run it against a node external to this process (although the ES node could
still be running on the same box).

I feel the results coming back are correct still.

Paul

Paul_Smith · October 27, 2011, 12:34am

On 26 October 2011 14:15, Paul Smith tallpsmith@gmail.com wrote:

On 26 October 2011 13:54, Shay Banon kimchy@gmail.com wrote:

You should not get this failure when scanning, are you sure the results
you get back are good? It basically means that the context kept on each
shard to support scanning is missing, which can happen for several reasons.
The first is that it timed out, second is that it was exhausted, but
somewhere it is still being asked for, which can happen if you don't use the
returned scroll id from the previous request.

Here's the relevant class doing the Scroll:

https://github.com/Aconex/scrutineer/blob/master/src/main/java/com/aconex/scrutineer/elasticsearch/ElasticSearchDownloader.java

Shay, any further info I can give you, or something else I should be trying?
Your original reply makes me nervous..

Thanks for 0.18!

cheers,

Paul

kimchy · October 27, 2011, 7:06pm

Heya, quickly looked at the code, and I see the problem. The consumeBatches
should feed the next search request with the previous search request scroll
id. It seems like you use the initial scroll id returned from the first scan
request. Things will still work when you don't do it, but you will see the
mentioned failures (and its better to fix it).

On Thu, Oct 27, 2011 at 2:34 AM, Paul Smith tallpsmith@gmail.com wrote:

On 26 October 2011 14:15, Paul Smith tallpsmith@gmail.com wrote:

On 26 October 2011 13:54, Shay Banon kimchy@gmail.com wrote:

You should not get this failure when scanning, are you sure the results
you get back are good? It basically means that the context kept on each
shard to support scanning is missing, which can happen for several reasons.
The first is that it timed out, second is that it was exhausted, but
somewhere it is still being asked for, which can happen if you don't use the
returned scroll id from the previous request.

Here's the relevant class doing the Scroll:

https://github.com/Aconex/scrutineer/blob/master/src/main/java/com/aconex/scrutineer/elasticsearch/ElasticSearchDownloader.java

Shay, any further info I can give you, or something else I should be
trying? Your original reply makes me nervous..

Thanks for 0.18!

cheers,

Paul

Paul_Smith · October 27, 2011, 11:18pm

On 28 October 2011 06:06, Shay Banon kimchy@gmail.com wrote:

Heya, quickly looked at the code, and I see the problem. The consumeBatches
should feed the next search request with the previous search request scroll
id. It seems like you use the initial scroll id returned from the first scan
request. Things will still work when you don't do it, but you will see the
mentioned failures (and its better to fix it).

Um, ok, that's odd. I had thought the pattern was:

Initiate a _scan
get the scrollId from the scan
call prepareSearchScroll with this scrollId
continue to use prepareSearchScroll to iterate over the batches passing in
the original scrollId?

Am I reading you right in saying that the first scrollId (scan) is only used
on the first prepareSearchScroll call, and then we're expected to get a
new scrollId from the 1st iteration to use for all the other batches?

Paul

kimchy · October 27, 2011, 11:32pm

Not sure what I am missing in the way I explain it. You need to provide the
previous response scroll id to the new scroll request you are making, as it
can change through calls. There is an example for it on the Java API page:
Elasticsearch Platform — Find real-time answers at scale | Elastic.

On Fri, Oct 28, 2011 at 1:18 AM, Paul Smith tallpsmith@gmail.com wrote:

On 28 October 2011 06:06, Shay Banon kimchy@gmail.com wrote:

Heya, quickly looked at the code, and I see the problem. The
consumeBatches should feed the next search request with the previous search
request scroll id. It seems like you use the initial scroll id returned from
the first scan request. Things will still work when you don't do it, but you
will see the mentioned failures (and its better to fix it).

Um, ok, that's odd. I had thought the pattern was:

Initiate a _scan

get the scrollId from the scan

call prepareSearchScroll with this scrollId

continue to use prepareSearchScroll to iterate over the batches passing
in the original scrollId?

Am I reading you right in saying that the first scrollId (scan) is only
used on the first prepareSearchScroll call, and then we're expected to get
a new scrollId from the 1st iteration to use for all the other batches?

Paul

Paul_Smith · October 27, 2011, 11:35pm

On 28 October 2011 10:32, Shay Banon kimchy@gmail.com wrote:

Not sure what I am missing in the way I explain it. You need to provide the
previous response scroll id to the new scroll request you are making, as it
can change through calls. There is an example for it on the Java API page:
Elasticsearch Platform — Find real-time answers at scale | Elastic.

ok you are right, I overlooked that, it's not what I had a mental model of.
I was thinking more like a SQL cursor where the initial query creates the
cursor, and then the same handle is used throughout.

Thanks Shay, I'll make the changes.

Paul

kimchy · October 27, 2011, 11:38pm

Yea, the reason why it behaves like that is because the scroll id actually
holds information per shard, and also, which shards are needed to continue
to be scrolled, and which don't. That changes as you scroll through the
results.

On Fri, Oct 28, 2011 at 1:35 AM, Paul Smith tallpsmith@gmail.com wrote:

On 28 October 2011 10:32, Shay Banon kimchy@gmail.com wrote:

Not sure what I am missing in the way I explain it. You need to provide
the previous response scroll id to the new scroll request you are making, as
it can change through calls. There is an example for it on the Java API
page: Elasticsearch Platform — Find real-time answers at scale | Elastic.

ok you are right, I overlooked that, it's not what I had a mental model of.
I was thinking more like a SQL cursor where the initial query creates the
cursor, and then the same handle is used throughout.

Thanks Shay, I'll make the changes.

Paul

Topic		Replies	Views
Logging exceptions at DEBUG? Elasticsearch	2	297	July 6, 2017
SearchContextMissingException when doing scan search [0.90.7-0.90.13] Elasticsearch	8	479	July 6, 2017
Scansearch failure Elasticsearch	2	657	July 6, 2017
Scansearch failure Elasticsearch	1	332	July 6, 2017
Scan Search error Elasticsearch	1	494	July 6, 2017

DEBUG level error question

Related topics