Apparent scroll timeout error


(Grant Rodgers) #1

I'm getting this error when scrolling:

{"error":"ReduceSearchPhaseException[Failed to execute phase [fetch],
[reduce] ; shardFailures {SearchContextMissingException[No search
context found for id [353389]]}{RemoteTransportException[[Sigmar]
[inet[/10.198.61.171:9300]][search/phase/scan/scroll]]; nested:
SearchContextMissingException[No search context found for id
[279091]]; }{RemoteTransportException[[Wildpride][inet[/
10.96.85.229:9300]][search/phase/scan/scroll]]; nested:
SearchContextMissingException[No search context found for id
[205220]]; }{RemoteTransportException[[Zach][inet[/10.97.30.240:9300]]
[search/phase/scan/scroll]]; nested: SearchContextMissingException[No
search context found for id [620590]]; }
{RemoteTransportException[[Zach][inet[/10.97.30.240:9300]][search/
phase/scan/scroll]]; nested: SearchContextMissingException[No search
context found for id [620591]]; }{RemoteTransportException[[Corbo,
Jared][inet[/10.211.95.128:9300]][search/phase/scan/scroll]]; nested:
SearchContextMissingException[No search context found for id
[79383]]; }]; nested: IndexOutOfBoundsException[index (0) must be less
than size (0)]; ","status":500}

This error does not generate a stack trace in the elasticsearch logs.

Initially, my scroll parameter was '5m', and the 'user' time elapsed
(as reported by time command) before the scroll failed was roughly 5
minutes. I then set the scroll parameter to '20m' and the user time
elapsed before the failure was roughly 20 minutes.

Is the scroll time parameter the timeout since the first scroll
request, or since the most recent scroll request? My intuitive
understanding was that the timeout is updated on every scroll request,
since otherwise it would be difficult to scroll very large indices. Is
this an incorrect assumption?

On a related note, the documentation for scrolling needs to be fleshed
out. The scroll uri endpoint is only documented in a few gists
scattered around.


(Shay Banon) #2

The timeout parameter applies to the next API call, not for the entire duration of the scrolling. Which version are you using, and are you scrolling a scan search type, or a different search type?
On Thursday, May 19, 2011 at 11:09 PM, Grant Rodgers wrote:

I'm getting this error when scrolling:

{"error":"ReduceSearchPhaseException[Failed to execute phase [fetch],
[reduce] ; shardFailures {SearchContextMissingException[No search
context found for id [353389]]}{RemoteTransportException[[Sigmar]
[inet[/10.198.61.171:9300]][search/phase/scan/scroll]]; nested:
SearchContextMissingException[No search context found for id
[279091]]; }{RemoteTransportException[[Wildpride][inet[/
10.96.85.229:9300]][search/phase/scan/scroll]]; nested:
SearchContextMissingException[No search context found for id
[205220]]; }{RemoteTransportException[[Zach][inet[/10.97.30.240:9300]]
[search/phase/scan/scroll]]; nested: SearchContextMissingException[No
search context found for id [620590]]; }
{RemoteTransportException[[Zach][inet[/10.97.30.240:9300]][search/
phase/scan/scroll]]; nested: SearchContextMissingException[No search
context found for id [620591]]; }{RemoteTransportException[[Corbo,
Jared][inet[/10.211.95.128:9300]][search/phase/scan/scroll]]; nested:
SearchContextMissingException[No search context found for id
[79383]]; }]; nested: IndexOutOfBoundsException[index (0) must be less
than size (0)]; ","status":500}

This error does not generate a stack trace in the elasticsearch logs.

Initially, my scroll parameter was '5m', and the 'user' time elapsed
(as reported by time command) before the scroll failed was roughly 5
minutes. I then set the scroll parameter to '20m' and the user time
elapsed before the failure was roughly 20 minutes.

Is the scroll time parameter the timeout since the first scroll
request, or since the most recent scroll request? My intuitive
understanding was that the timeout is updated on every scroll request,
since otherwise it would be difficult to scroll very large indices. Is
this an incorrect assumption?

On a related note, the documentation for scrolling needs to be fleshed
out. The scroll uri endpoint is only documented in a few gists
scattered around.


(Grant Rodgers) #3

I am using 0.16.1 and scan search type

On May 19, 1:11 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

The timeout parameter applies to the next API call, not for the entire duration of the scrolling. Which version are you using, and are you scrolling a scan search type, or a different search type?

On Thursday, May 19, 2011 at 11:09 PM, Grant Rodgers wrote:

I'm getting this error when scrolling:

{"error":"ReduceSearchPhaseException[Failed to execute phase [fetch],
[reduce] ; shardFailures {SearchContextMissingException[No search
context found for id [353389]]}{RemoteTransportException[[Sigmar]
[inet[/10.198.61.171:9300]][search/phase/scan/scroll]]; nested:
SearchContextMissingException[No search context found for id
[279091]]; }{RemoteTransportException[[Wildpride][inet[/
10.96.85.229:9300]][search/phase/scan/scroll]]; nested:
SearchContextMissingException[No search context found for id
[205220]]; }{RemoteTransportException[[Zach][inet[/10.97.30.240:9300]]
[search/phase/scan/scroll]]; nested: SearchContextMissingException[No
search context found for id [620590]]; }
{RemoteTransportException[[Zach][inet[/10.97.30.240:9300]][search/
phase/scan/scroll]]; nested: SearchContextMissingException[No search
context found for id [620591]]; }{RemoteTransportException[[Corbo,
Jared][inet[/10.211.95.128:9300]][search/phase/scan/scroll]]; nested:
SearchContextMissingException[No search context found for id
[79383]]; }]; nested: IndexOutOfBoundsException[index (0) must be less
than size (0)]; ","status":500}

This error does not generate a stack trace in the elasticsearch logs.

Initially, my scroll parameter was '5m', and the 'user' time elapsed
(as reported by time command) before the scroll failed was roughly 5
minutes. I then set the scroll parameter to '20m' and the user time
elapsed before the failure was roughly 20 minutes.

Is the scroll time parameter the timeout since the first scroll
request, or since the most recent scroll request? My intuitive
understanding was that the timeout is updated on every scroll request,
since otherwise it would be difficult to scroll very large indices. Is
this an incorrect assumption?

On a related note, the documentation for scrolling needs to be fleshed
out. The scroll uri endpoint is only documented in a few gists
scattered around.


(Shay Banon) #4

Can you provide a recreation in some form? Are you sure you are feeding the scroll_id you get from each response back to the next request? Btw, the scroll timeout value does not affect any timeout set on the search part.
On Thursday, May 19, 2011 at 11:15 PM, Grant Rodgers wrote:
I am using 0.16.1 and scan search type

On May 19, 1:11 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

The timeout parameter applies to the next API call, not for the entire duration of the scrolling. Which version are you using, and are you scrolling a scan search type, or a different search type?

On Thursday, May 19, 2011 at 11:09 PM, Grant Rodgers wrote:

I'm getting this error when scrolling:

{"error":"ReduceSearchPhaseException[Failed to execute phase [fetch],
[reduce] ; shardFailures {SearchContextMissingException[No search
context found for id [353389]]}{RemoteTransportException[[Sigmar]
[inet[/10.198.61.171:9300]][search/phase/scan/scroll]]; nested:
SearchContextMissingException[No search context found for id
[279091]]; }{RemoteTransportException[[Wildpride][inet[/
10.96.85.229:9300]][search/phase/scan/scroll]]; nested:
SearchContextMissingException[No search context found for id
[205220]]; }{RemoteTransportException[[Zach][inet[/10.97.30.240:9300]]
[search/phase/scan/scroll]]; nested: SearchContextMissingException[No
search context found for id [620590]]; }
{RemoteTransportException[[Zach][inet[/10.97.30.240:9300]][search/
phase/scan/scroll]]; nested: SearchContextMissingException[No search
context found for id [620591]]; }{RemoteTransportException[[Corbo,
Jared][inet[/10.211.95.128:9300]][search/phase/scan/scroll]]; nested:
SearchContextMissingException[No search context found for id
[79383]]; }]; nested: IndexOutOfBoundsException[index (0) must be less
than size (0)]; ","status":500}

This error does not generate a stack trace in the elasticsearch logs.

Initially, my scroll parameter was '5m', and the 'user' time elapsed
(as reported by time command) before the scroll failed was roughly 5
minutes. I then set the scroll parameter to '20m' and the user time
elapsed before the failure was roughly 20 minutes.

Is the scroll time parameter the timeout since the first scroll
request, or since the most recent scroll request? My intuitive
understanding was that the timeout is updated on every scroll request,
since otherwise it would be difficult to scroll very large indices. Is
this an incorrect assumption?

On a related note, the documentation for scrolling needs to be fleshed
out. The scroll uri endpoint is only documented in a few gists
scattered around.


(Grant Rodgers) #5

After investigating further, this may be intended behavior.

In some cases, the scroll loop can take several seconds to complete,
and since the scan type multiplies the limit (currently 100) by the
number of shards (6), that means the time between scroll requests can
be more than 30 minutes. The correlation between user time and scroll
time could be a red herring.

I am changing the code to divide the limit by number of shards so that
we get the expected number of records back from the request. If this
doesn't fix the issue I will reply with a recreation.

Thanks! Sorry for the noise - as usual it is probably user error
instead of elasticsearch :slight_smile: Although I stand by my warning about
scroll documentation - the multiplying property of scan is
undocumented and may catch other users out.

Grant

On May 19, 1:23 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Can you provide a recreation in some form? Are you sure you are feeding the scroll_id you get from each response back to the next request? Btw, the scroll timeout value does not affect any timeout set on the search part.On Thursday, May 19, 2011 at 11:15 PM, Grant Rodgers wrote:

I am using 0.16.1 and scan search type

On May 19, 1:11 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

The timeout parameter applies to the next API call, not for the entire duration of the scrolling. Which version are you using, and are you scrolling a scan search type, or a different search type?

On Thursday, May 19, 2011 at 11:09 PM, Grant Rodgers wrote:

I'm getting this error when scrolling:

{"error":"ReduceSearchPhaseException[Failed to execute phase [fetch],
[reduce] ; shardFailures {SearchContextMissingException[No search
context found for id [353389]]}{RemoteTransportException[[Sigmar]
[inet[/10.198.61.171:9300]][search/phase/scan/scroll]]; nested:
SearchContextMissingException[No search context found for id
[279091]]; }{RemoteTransportException[[Wildpride][inet[/
10.96.85.229:9300]][search/phase/scan/scroll]]; nested:
SearchContextMissingException[No search context found for id
[205220]]; }{RemoteTransportException[[Zach][inet[/10.97.30.240:9300]]
[search/phase/scan/scroll]]; nested: SearchContextMissingException[No
search context found for id [620590]]; }
{RemoteTransportException[[Zach][inet[/10.97.30.240:9300]][search/
phase/scan/scroll]]; nested: SearchContextMissingException[No search
context found for id [620591]]; }{RemoteTransportException[[Corbo,
Jared][inet[/10.211.95.128:9300]][search/phase/scan/scroll]]; nested:
SearchContextMissingException[No search context found for id
[79383]]; }]; nested: IndexOutOfBoundsException[index (0) must be less
than size (0)]; ","status":500}

This error does not generate a stack trace in the elasticsearch logs.

Initially, my scroll parameter was '5m', and the 'user' time elapsed
(as reported by time command) before the scroll failed was roughly 5
minutes. I then set the scroll parameter to '20m' and the user time
elapsed before the failure was roughly 20 minutes.

Is the scroll time parameter the timeout since the first scroll
request, or since the most recent scroll request? My intuitive
understanding was that the timeout is updated on every scroll request,
since otherwise it would be difficult to scroll very large indices. Is
this an incorrect assumption?

On a related note, the documentation for scrolling needs to be fleshed
out. The scroll uri endpoint is only documented in a few gists
scattered around.


(Shay Banon) #6

On Friday, May 20, 2011 at 12:02 AM, Grant Rodgers wrote:
After investigating further, this may be intended behavior.

In some cases, the scroll loop can take several seconds to complete,
and since the scan type multiplies the limit (currently 100) by the
number of shards (6), that means the time between scroll requests can
be more than 30 minutes. The correlation between user time and scroll
time could be a red herring.
You mean the processing time of the scan result can take more time? I see.

I am changing the code to divide the limit by number of shards so that
we get the expected number of records back from the request. If this
doesn't fix the issue I will reply with a recreation.

Thanks! Sorry for the noise - as usual it is probably user error
instead of elasticsearch :slight_smile: Although I stand by my warning about
scroll documentation - the multiplying property of scan is
undocumented and may catch other users out.
Agreed, it should be documented. Can you help? :slight_smile:

Grant

On May 19, 1:23 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Can you provide a recreation in some form? Are you sure you are feeding the scroll_id you get from each response back to the next request? Btw, the scroll timeout value does not affect any timeout set on the search part.On Thursday, May 19, 2011 at 11:15 PM, Grant Rodgers wrote:

I am using 0.16.1 and scan search type

On May 19, 1:11 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

The timeout parameter applies to the next API call, not for the entire duration of the scrolling. Which version are you using, and are you scrolling a scan search type, or a different search type?

On Thursday, May 19, 2011 at 11:09 PM, Grant Rodgers wrote:

I'm getting this error when scrolling:

{"error":"ReduceSearchPhaseException[Failed to execute phase [fetch],
[reduce] ; shardFailures {SearchContextMissingException[No search
context found for id [353389]]}{RemoteTransportException[[Sigmar]
[inet[/10.198.61.171:9300]][search/phase/scan/scroll]]; nested:
SearchContextMissingException[No search context found for id
[279091]]; }{RemoteTransportException[[Wildpride][inet[/
10.96.85.229:9300]][search/phase/scan/scroll]]; nested:
SearchContextMissingException[No search context found for id
[205220]]; }{RemoteTransportException[[Zach][inet[/10.97.30.240:9300]]
[search/phase/scan/scroll]]; nested: SearchContextMissingException[No
search context found for id [620590]]; }
{RemoteTransportException[[Zach][inet[/10.97.30.240:9300]][search/
phase/scan/scroll]]; nested: SearchContextMissingException[No search
context found for id [620591]]; }{RemoteTransportException[[Corbo,
Jared][inet[/10.211.95.128:9300]][search/phase/scan/scroll]]; nested:
SearchContextMissingException[No search context found for id
[79383]]; }]; nested: IndexOutOfBoundsException[index (0) must be less
than size (0)]; ","status":500}

This error does not generate a stack trace in the elasticsearch logs.

Initially, my scroll parameter was '5m', and the 'user' time elapsed
(as reported by time command) before the scroll failed was roughly 5
minutes. I then set the scroll parameter to '20m' and the user time
elapsed before the failure was roughly 20 minutes.

Is the scroll time parameter the timeout since the first scroll
request, or since the most recent scroll request? My intuitive
understanding was that the timeout is updated on every scroll request,
since otherwise it would be difficult to scroll very large indices. Is
this an incorrect assumption?

On a related note, the documentation for scrolling needs to be fleshed
out. The scroll uri endpoint is only documented in a few gists
scattered around.


(Grant Rodgers) #7

Well I discovered that the scan documentation now includes an example
of how to scroll, so it's better than I expected. I added an example
to the scrolling page anyway and some other notes. Pull request
incoming!

On May 19, 2:05 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

On Friday, May 20, 2011 at 12:02 AM, Grant Rodgers wrote:

After investigating further, this may be intended behavior.

In some cases, the scroll loop can take several seconds to complete,
and since the scan type multiplies the limit (currently 100) by the
number of shards (6), that means the time between scroll requests can
be more than 30 minutes. The correlation between user time and scroll
time could be a red herring.

You mean the processing time of the scan result can take more time? I see.

I am changing the code to divide the limit by number of shards so that
we get the expected number of records back from the request. If this
doesn't fix the issue I will reply with a recreation.

Thanks! Sorry for the noise - as usual it is probably user error
instead of elasticsearch :slight_smile: Although I stand by my warning about
scroll documentation - the multiplying property of scan is
undocumented and may catch other users out.

Agreed, it should be documented. Can you help? :slight_smile:

Grant

On May 19, 1:23 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Can you provide a recreation in some form? Are you sure you are feeding the scroll_id you get from each response back to the next request? Btw, the scroll timeout value does not affect any timeout set on the search part.On Thursday, May 19, 2011 at 11:15 PM, Grant Rodgers wrote:

I am using 0.16.1 and scan search type

On May 19, 1:11 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

The timeout parameter applies to the next API call, not for the entire duration of the scrolling. Which version are you using, and are you scrolling a scan search type, or a different search type?

On Thursday, May 19, 2011 at 11:09 PM, Grant Rodgers wrote:

I'm getting this error when scrolling:

{"error":"ReduceSearchPhaseException[Failed to execute phase [fetch],
[reduce] ; shardFailures {SearchContextMissingException[No search
context found for id [353389]]}{RemoteTransportException[[Sigmar]
[inet[/10.198.61.171:9300]][search/phase/scan/scroll]]; nested:
SearchContextMissingException[No search context found for id
[279091]]; }{RemoteTransportException[[Wildpride][inet[/
10.96.85.229:9300]][search/phase/scan/scroll]]; nested:
SearchContextMissingException[No search context found for id
[205220]]; }{RemoteTransportException[[Zach][inet[/10.97.30.240:9300]]
[search/phase/scan/scroll]]; nested: SearchContextMissingException[No
search context found for id [620590]]; }
{RemoteTransportException[[Zach][inet[/10.97.30.240:9300]][search/
phase/scan/scroll]]; nested: SearchContextMissingException[No search
context found for id [620591]]; }{RemoteTransportException[[Corbo,
Jared][inet[/10.211.95.128:9300]][search/phase/scan/scroll]]; nested:
SearchContextMissingException[No search context found for id
[79383]]; }]; nested: IndexOutOfBoundsException[index (0) must be less
than size (0)]; ","status":500}

This error does not generate a stack trace in the elasticsearch logs.

Initially, my scroll parameter was '5m', and the 'user' time elapsed
(as reported by time command) before the scroll failed was roughly 5
minutes. I then set the scroll parameter to '20m' and the user time
elapsed before the failure was roughly 20 minutes.

Is the scroll time parameter the timeout since the first scroll
request, or since the most recent scroll request? My intuitive
understanding was that the timeout is updated on every scroll request,
since otherwise it would be difficult to scroll very large indices. Is
this an incorrect assumption?

On a related note, the documentation for scrolling needs to be fleshed
out. The scroll uri endpoint is only documented in a few gists
scattered around.


(system) #8