ScrollId Timeout

Is it necessary to pass the scrollId timeout value in each subsequent search
scroll request?

As in the example at
http://www.elasticsearch.org/guide/reference/api/search/search-type.html

On Wed, 2011-06-15 at 19:53 -0400, James Cook wrote:

Is it necessary to pass the scrollId timeout value in each subsequent
search scroll request?

The scroll timeout and the new _scroll_id from the previous search or
scroll request.

The final request will return zero hits, which is how you know that
you're done.

clint

Hi Clinton,

It doesn't seem to work that way. At least anecdotally, I don't have to pass
the timeout value in subsequent search scroll requests. Perhaps this means
that my "cursor" will time out based on the amount of time that has elapsed
from the initial scroll request, but I was checking to see if this is the
case.

I wish the last result did return 0 hits, but instead it is currently
(0.16.2) throwing an exception.

-- jim

On Wed, Jun 15, 2011 at 11:38 PM, Clinton Gormley
clinton@iannounce.co.ukwrote:

On Wed, 2011-06-15 at 19:53 -0400, James Cook wrote:

Is it necessary to pass the scrollId timeout value in each subsequent
search scroll request?

The scroll timeout and the new _scroll_id from the previous search or
scroll request.

The final request will return zero hits, which is how you know that
you're done.

clint

Hi James

It doesn't seem to work that way. At least anecdotally, I don't have
to pass the timeout value in subsequent search scroll requests.

I wish the last result did return 0 hits, but instead it is currently
(0.16.2) throwing an exception.
Scroll search always throws IndexOutOfBoundsException on last iteration · Issue #1008 · elastic/elasticsearch · GitHub

Is the fact that you're not passing the timeout not the reason that you
are seeing the exception?

I use scrolling as I described, and it works without any errors. Note:
if you don't pass the timeout, you may still see a few successful scroll
results, but it won't last :wink:

clint

I attached a simple gist to the issue to recreate. If it succeeds for you,
perhaps there is a config parameter which is different between our set ups
which has some impact on the results?

So, the timeout value needs to be constantly supplied on each successive
search. I can't think of a use case where that is a helpful feature.
Couldn't each time a scroll_id is referenced, it updates its TTL with the
original value. I suppose passing the timeout each time only makes sense if:

a) The TTL needs to change while retrieving pages of results, or
b) ES doesn't have a way of knowing what the original TTL was.

Either way, it is a simple workaround even though it is a bit strange.

-- jim

On Thu, Jun 16, 2011 at 2:50 AM, Clinton Gormley clinton@iannounce.co.ukwrote:

Hi James

It doesn't seem to work that way. At least anecdotally, I don't have
to pass the timeout value in subsequent search scroll requests.

I wish the last result did return 0 hits, but instead it is currently
(0.16.2) throwing an exception.
Scroll search always throws IndexOutOfBoundsException on last iteration · Issue #1008 · elastic/elasticsearch · GitHub

Is the fact that you're not passing the timeout not the reason that you
are seeing the exception?

I use scrolling as I described, and it works without any errors. Note:
if you don't pass the timeout, you may still see a few successful scroll
results, but it won't last :wink:

clint

Hi James

On Thu, 2011-06-16 at 03:15 -0400, James Cook wrote:

I attached a simple gist to the issue to recreate. If it succeeds for
you, perhaps there is a config parameter which is different between
our set ups which has some impact on the results?

I've got no special config

Here's an example of a scrolled search which works for me on 0.16.2:

clint

Thanks clinton. Were you able to duplicate my recreation and the exception?
On Jun 16, 2011 3:31 AM, "Clinton Gormley" clinton@iannounce.co.uk wrote:

Hi James

On Thu, 2011-06-16 at 03:15 -0400, James Cook wrote:

I attached a simple gist to the issue to recreate. If it succeeds for
you, perhaps there is a config parameter which is different between
our set ups which has some impact on the results?

I've got no special config

Here's an example of a scrolled search which works for me on 0.16.2:

Scrolled search · GitHub

clint

Hi James

On Thu, 2011-06-16 at 09:44 -0400, James Cook wrote:

Thanks clinton. Were you able to duplicate my recreation and the
exception?

I don't use the Java API I'm afraid (or Java for that matter), so no :slight_smile:

but if you look through the curl script that i linked to, you can check
that you're doing the same steps that i did, and if there is a
difference, then that's probably where the issue is.

if there isn't, well then it may be a bug

clint

Scrolled search · GitHub

clint

It's not a Java recreation. It's Curl.

Check out: Scroll search always throws IndexOutOfBoundsException on last iteration · Issue #1008 · elastic/elasticsearch · GitHub

It's very simple:

curl -XPOST 'http://localhost:9200/twitter/tweet/1' -d '{ "user": "kimchy"
}'

curl -XGET 'localhost:9200/_search?search_type=scan&scroll=5m&pretty=true'
-d '{ "query" : { "term": {"user":"kimchy"} } }'

get scrollID

curl -GET 'localhost:9200/_search/scroll?scroll=5m&pretty=true' -d
''

returns 1 hit

curl -GET 'localhost:9200/_search/scroll?scroll=5m&pretty=true' -d
''

throws exception

On Thu, Jun 16, 2011 at 10:21 AM, Clinton Gormley
clinton@iannounce.co.ukwrote:

Hi James

On Thu, 2011-06-16 at 09:44 -0400, James Cook wrote:

Thanks clinton. Were you able to duplicate my recreation and the
exception?

I don't use the Java API I'm afraid (or Java for that matter), so no :slight_smile:

but if you look through the curl script that i linked to, you can check
that you're doing the same steps that i did, and if there is a
difference, then that's probably where the issue is.

if there isn't, well then it may be a bug

clint

Scrolled search · GitHub

clint

Sorry - I'm flu ridden - I missed that:

get scrollID

curl -GET 'localhost:9200/_search/scroll?scroll=5m&pretty=true' -d
''

returns 1 hit

curl -GET 'localhost:9200/_search/scroll?scroll=5m&pretty=true' -d
''

Which scroll ID are you passing to the last statement? The scroll ID
from the search? Or the scroll ID from the previous scroll request?

It should be the latter

clint

James I will check you case. Providing the scroll parameter (with the timeout) means that you want to continue scrolling the request. Not passing it means that you don't. When scrolling, you need to make sure that you pass the scroll id you got from the previous response to the next scroll request.

On Thursday, June 16, 2011 at 6:53 PM, Clinton Gormley wrote:

Sorry - I'm flu ridden - I missed that:

get scrollID

curl -GET 'localhost:9200/_search/scroll?scroll=5m&pretty=true' -d
''

returns 1 hit

curl -GET 'localhost:9200/_search/scroll?scroll=5m&pretty=true' -d
''
Which scroll ID are you passing to the last statement? The scroll ID
from the search? Or the scroll ID from the previous scroll request?

It should be the latter

clint

Thanks Shay and Clinton. That must be the problem. I have been using the
scroll_id from the very first "setup" request to make subsequent requests.
I'll add a comment to the issue if this is the case.

--- jim

James I will check you case. Providing the scroll parameter (with the

timeout) means that you want to continue scrolling the request. Not passing
it means that you don't. When scrolling, you need to make sure that you pass
the scroll id you got from the previous response to the next scroll request.

On Thursday, June 16, 2011 at 6:53 PM, Clinton Gormley wrote:

Sorry - I'm flu ridden - I missed that:

get scrollID

curl -GET 'localhost:9200/_search/scroll?scroll=5m&pretty=true' -d
''

returns 1 hit

curl -GET 'localhost:9200/_search/scroll?scroll=5m&pretty=true' -d
''

Which scroll ID are you passing to the last statement? The scroll ID
from the search? Or the scroll ID from the previous scroll request?

It should be the latter

clint