DEBUG level error question

While issuing a _scan call, I see these stack traces logged at DEBUG level
on my client:

5843 [New I/O client worker #1-2] DEBUG org.elasticsearch.action.search.type

  • [Captain Savage] [25] Failed to execute query phase
    org.elasticsearch.transport.RemoteTransportException:
    [index4.qa.acx][inet[/192.168.34.34:9300]][search/phase/scan/scroll]
    Caused by: org.elasticsearch.search.SearchContextMissingException: No search
    context found for id [25]
    at
    org.elasticsearch.search.SearchService.findContext(SearchService.java:389)
    at
    org.elasticsearch.search.SearchService.executeScan(SearchService.java:202)
    at
    org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:591)
    at
    org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:582)
    at
    org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:238)
    at
    java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)

Now, it doesn't affect the outcome at all, seems benign, and it is DEBUG
after all, but hey, I just like to know what this is really trying to say.
Paranoid? Yeah, probably.

Paul

You should not get this failure when scanning, are you sure the results you
get back are good? It basically means that the context kept on each shard to
support scanning is missing, which can happen for several reasons. The first
is that it timed out, second is that it was exhausted, but somewhere it is
still being asked for, which can happen if you don't use the returned scroll
id from the previous request.

On Tue, Oct 25, 2011 at 11:19 AM, Paul Smith tallpsmith@gmail.com wrote:

While issuing a _scan call, I see these stack traces logged at DEBUG level
on my client:

5843 [New I/O client worker #1-2] DEBUG
org.elasticsearch.action.search.type - [Captain Savage] [25] Failed to
execute query phase
org.elasticsearch.transport.RemoteTransportException:
[index4.qa.acx][inet[/192.168.34.34:9300]][search/phase/scan/scroll]
Caused by: org.elasticsearch.search.SearchContextMissingException: No
search context found for id [25]
at
org.elasticsearch.search.SearchService.findContext(SearchService.java:389)
at
org.elasticsearch.search.SearchService.executeScan(SearchService.java:202)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:591)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:582)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:238)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

Now, it doesn't affect the outcome at all, seems benign, and it is DEBUG
after all, but hey, I just like to know what this is really trying to say.
Paranoid? Yeah, probably.

Paul

On 26 October 2011 13:54, Shay Banon kimchy@gmail.com wrote:

You should not get this failure when scanning, are you sure the results you
get back are good? It basically means that the context kept on each shard to
support scanning is missing, which can happen for several reasons. The first
is that it timed out, second is that it was exhausted, but somewhere it is
still being asked for, which can happen if you don't use the returned scroll
id from the previous request.

Here's the relevant class doing the Scroll:

https://github.com/Aconex/scrutineer/blob/master/src/main/java/com/aconex/scrutineer/elasticsearch/ElasticSearchDownloader.java

We're configuring the Scroll timeout to be 10 minutes, and it looks to me
that we're configuring the scrollId in the subsequent batch requests.

This error does NOT happen when we run 'embedded' test modes, only when we
run it against a node external to this process (although the ES node could
still be running on the same box).

I feel the results coming back are correct still.

Paul

On 26 October 2011 14:15, Paul Smith tallpsmith@gmail.com wrote:

On 26 October 2011 13:54, Shay Banon kimchy@gmail.com wrote:

You should not get this failure when scanning, are you sure the results
you get back are good? It basically means that the context kept on each
shard to support scanning is missing, which can happen for several reasons.
The first is that it timed out, second is that it was exhausted, but
somewhere it is still being asked for, which can happen if you don't use the
returned scroll id from the previous request.

Here's the relevant class doing the Scroll:

https://github.com/Aconex/scrutineer/blob/master/src/main/java/com/aconex/scrutineer/elasticsearch/ElasticSearchDownloader.java

Shay, any further info I can give you, or something else I should be trying?
Your original reply makes me nervous.. :slight_smile:

Thanks for 0.18!

cheers,

Paul

Heya, quickly looked at the code, and I see the problem. The consumeBatches
should feed the next search request with the previous search request scroll
id. It seems like you use the initial scroll id returned from the first scan
request. Things will still work when you don't do it, but you will see the
mentioned failures (and its better to fix it).

On Thu, Oct 27, 2011 at 2:34 AM, Paul Smith tallpsmith@gmail.com wrote:

On 26 October 2011 14:15, Paul Smith tallpsmith@gmail.com wrote:

On 26 October 2011 13:54, Shay Banon kimchy@gmail.com wrote:

You should not get this failure when scanning, are you sure the results
you get back are good? It basically means that the context kept on each
shard to support scanning is missing, which can happen for several reasons.
The first is that it timed out, second is that it was exhausted, but
somewhere it is still being asked for, which can happen if you don't use the
returned scroll id from the previous request.

Here's the relevant class doing the Scroll:

https://github.com/Aconex/scrutineer/blob/master/src/main/java/com/aconex/scrutineer/elasticsearch/ElasticSearchDownloader.java

Shay, any further info I can give you, or something else I should be
trying? Your original reply makes me nervous.. :slight_smile:

Thanks for 0.18!

cheers,

Paul

On 28 October 2011 06:06, Shay Banon kimchy@gmail.com wrote:

Heya, quickly looked at the code, and I see the problem. The consumeBatches
should feed the next search request with the previous search request scroll
id. It seems like you use the initial scroll id returned from the first scan
request. Things will still work when you don't do it, but you will see the
mentioned failures (and its better to fix it).

Um, ok, that's odd. I had thought the pattern was:

  • Initiate a _scan
  • get the scrollId from the scan
  • call prepareSearchScroll with this scrollId
  • continue to use prepareSearchScroll to iterate over the batches passing in
    the original scrollId?

Am I reading you right in saying that the first scrollId (scan) is only used
on the first prepareSearchScroll call, and then we're expected to get a
new scrollId from the 1st iteration to use for all the other batches?

Paul

Not sure what I am missing in the way I explain it. You need to provide the
previous response scroll id to the new scroll request you are making, as it
can change through calls. There is an example for it on the Java API page:
Elasticsearch Platform — Find real-time answers at scale | Elastic.

On Fri, Oct 28, 2011 at 1:18 AM, Paul Smith tallpsmith@gmail.com wrote:

On 28 October 2011 06:06, Shay Banon kimchy@gmail.com wrote:

Heya, quickly looked at the code, and I see the problem. The
consumeBatches should feed the next search request with the previous search
request scroll id. It seems like you use the initial scroll id returned from
the first scan request. Things will still work when you don't do it, but you
will see the mentioned failures (and its better to fix it).

Um, ok, that's odd. I had thought the pattern was:

  • Initiate a _scan
  • get the scrollId from the scan
  • call prepareSearchScroll with this scrollId
  • continue to use prepareSearchScroll to iterate over the batches passing
    in the original scrollId?

Am I reading you right in saying that the first scrollId (scan) is only
used on the first prepareSearchScroll call, and then we're expected to get
a new scrollId from the 1st iteration to use for all the other batches?

Paul

On 28 October 2011 10:32, Shay Banon kimchy@gmail.com wrote:

Not sure what I am missing in the way I explain it. You need to provide the
previous response scroll id to the new scroll request you are making, as it
can change through calls. There is an example for it on the Java API page:
Elasticsearch Platform — Find real-time answers at scale | Elastic.

ok you are right, I overlooked that, it's not what I had a mental model of.
I was thinking more like a SQL cursor where the initial query creates the
cursor, and then the same handle is used throughout.

Thanks Shay, I'll make the changes.

Paul

Yea, the reason why it behaves like that is because the scroll id actually
holds information per shard, and also, which shards are needed to continue
to be scrolled, and which don't. That changes as you scroll through the
results.

On Fri, Oct 28, 2011 at 1:35 AM, Paul Smith tallpsmith@gmail.com wrote:

On 28 October 2011 10:32, Shay Banon kimchy@gmail.com wrote:

Not sure what I am missing in the way I explain it. You need to provide
the previous response scroll id to the new scroll request you are making, as
it can change through calls. There is an example for it on the Java API
page: Elasticsearch Platform — Find real-time answers at scale | Elastic.

ok you are right, I overlooked that, it's not what I had a mental model of.
I was thinking more like a SQL cursor where the initial query creates the
cursor, and then the same handle is used throughout.

Thanks Shay, I'll make the changes.

Paul