Do unique/reusable _scroll_ids exist?

Hi guys,

I'm attempting to implement pagination for our application. The catch is
that our documents require a little post-query filtering, so sometimes if a
user requests 500 documents, we scroll, get 500 from ES, filter and end up
with a lower number. In this case, we perform the next scroll, get a number
of results and build until we have 500 valid docs.

I had some related questions about scrolling / the scroll id returned by
search scroll requests.

Question1: Is it possible to use the same scroll id multiple times to get
the same set of results in the over-all result set?

Question2: (related to Question1) I'm confused by the scroll_id returned
whilst doing a scan search then scrolling. What I see is that that when I
start scrolling, for a period of time I get the same _scroll_id back. After
some number of requests it changes. I would have expected to either (1) get
the same _scroll_id over and over or (2) get a different _scroll_id each
time. Are either of these correct? At the bottom of this mail I've given a
short example set of req/resp.

Any pointers on this appreciated. I'd also be interested in hearing from
anyone who has successfully implemented pagination and the approach you
took.

Cheers,
oli

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ah, I never gave my example. In case it's of use:

Request 1

curl -XPOST
'localhost:9200/foo/bar/_search?search_type=scan&scroll=10m&size=10'
-d
'{"query":{"constant_score":{"boost":1,"filter":{"term":{"x":false}}}}}'

{"_scroll_id":"abc
","took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":108,"max_score":0.0,"hits":}}

Request 2

curl -XPOST 'localhost:9200/_search/scroll?scroll=10m'
-d 'abc'

{"_scroll_id":"abc","took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":108,"max_score":0.0,"hits":[{..},
{..}]}}

.. after some number of requests

{"_scroll_id":"def","took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":108,"max_score":0.0,"hits":[{..},
{..}]}}

On Wednesday, June 5, 2013 8:48:43 AM UTC-7, Oli wrote:

Hi guys,

I'm attempting to implement pagination for our application. The catch is
that our documents require a little post-query filtering, so sometimes if a
user requests 500 documents, we scroll, get 500 from ES, filter and end up
with a lower number. In this case, we perform the next scroll, get a number
of results and build until we have 500 valid docs.

I had some related questions about scrolling / the scroll id returned by
search scroll requests.

Question1: Is it possible to use the same scroll id multiple times to
get the same set of results in the over-all result set?

Question2: (related to Question1) I'm confused by the scroll_id
returned whilst doing a scan search then scrolling. What I see is that
that when I start scrolling, for a period of time I get the same _scroll_id
back. After some number of requests it changes. I would have expected to
either (1) get the same _scroll_id over and over or (2) get a different
_scroll_id each time. Are either of these correct? At the bottom of this
mail I've given a short example set of req/resp.

Any pointers on this appreciated. I'd also be interested in hearing from
anyone who has successfully implemented pagination and the approach you
took.

Cheers,
oli

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi,

I had faced this issue quite long ago while implementing custom client side
code for ES data backup and re-indexing purpose.
Yes, the scroll ID remains same for few hits, after which it changes.

The solution is to use the scroll ID returned with every hit response in
the subsequent request, i.e. following an ID chaining mechanism will work.

Using the first scroll ID repeatedly fetches only a few results, not all.
I guess the scroll ID gets renewed after the timestamp expires (calculated
from the point of first hit). But this statement is based on random
observation, I am not sure of this, ES experts can elaborate the underlying
cause better. I would be glad to know the actual cause too.

  • Sujoy.

On Wednesday, June 5, 2013 9:20:34 PM UTC+5:30, Oli wrote:

Ah, I never gave my example. In case it's of use:

Request 1

curl -XPOST
'localhost:9200/foo/bar/_search?search_type=scan&scroll=10m&size=10'
-d
'{"query":{"constant_score":{"boost":1,"filter":{"term":{"x":false}}}}}'

{"_scroll_id":"abc
","took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":108,"max_score":0.0,"hits":}}

Request 2

curl -XPOST 'localhost:9200/_search/scroll?scroll=10m'
-d 'abc'

{"_scroll_id":"abc","took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":108,"max_score":0.0,"hits":[{..},
{..}]}}

.. after some number of requests

{"_scroll_id":"def","took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":108,"max_score":0.0,"hits":[{..},
{..}]}}

On Wednesday, June 5, 2013 8:48:43 AM UTC-7, Oli wrote:

Hi guys,

I'm attempting to implement pagination for our application. The catch is
that our documents require a little post-query filtering, so sometimes if a
user requests 500 documents, we scroll, get 500 from ES, filter and end up
with a lower number. In this case, we perform the next scroll, get a number
of results and build until we have 500 valid docs.

I had some related questions about scrolling / the scroll id returned by
search scroll requests.

Question1: Is it possible to use the same scroll id multiple times to
get the same set of results in the over-all result set?

Question2: (related to Question1) I'm confused by the scroll_id
returned whilst doing a scan search then scrolling. What I see is that
that when I start scrolling, for a period of time I get the same _scroll_id
back. After some number of requests it changes. I would have expected to
either (1) get the same _scroll_id over and over or (2) get a different
_scroll_id each time. Are either of these correct? At the bottom of this
mail I've given a short example set of req/resp.

Any pointers on this appreciated. I'd also be interested in hearing from
anyone who has successfully implemented pagination and the approach you
took.

Cheers,
oli

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks Sujoy, appreciate you getting back to this.

I've found that to be the solution. The unfortunate thing for me is that
I'd like to able to re-fetch the results for a specific position in a
scroll.

Given that a token appears to always yield the *next *set of results (at
least for all of the results that token represents) it seems like I can't
ever re-fetch something I've already obtained once. Any information to the
contrary of that would be great to hear!

  • oli

On Mon, Jun 10, 2013 at 2:27 AM, Sujoy Sett sujoysett@gmail.com wrote:

Hi,

I had faced this issue quite long ago while implementing custom client
side code for ES data backup and re-indexing purpose.
Yes, the scroll ID remains same for few hits, after which it changes.

The solution is to use the scroll ID returned with every hit response in
the subsequent request, i.e. following an ID chaining mechanism will work.

Using the first scroll ID repeatedly fetches only a few results, not all.
I guess the scroll ID gets renewed after the timestamp expires (calculated
from the point of first hit). But this statement is based on random
observation, I am not sure of this, ES experts can elaborate the underlying
cause better. I would be glad to know the actual cause too.

  • Sujoy.

On Wednesday, June 5, 2013 9:20:34 PM UTC+5:30, Oli wrote:

Ah, I never gave my example. In case it's of use:

Request 1

curl -XPOST 'localhost:9200/foo/bar/_**search?search_type=scan&**scroll=10m&size=10'

-d '{"query":{"constant_score":{"**boost":1,"filter":{"term":{"x"
**:false}}}}}'

{"_scroll_id":"abc","took":1,"timed_out":false,"_shards":{"
total":5,"successful":5,"failed":0},"hits":{"total":
108,"max_score":0.0,"hits":}**}

Request 2

curl -XPOST 'localhost:9200/_search/**scroll?scroll=10m'
-d 'abc'

{"_scroll_id":"abc","took":1,"timed_out":false,"_shards":{"
total":5,"successful":5,"failed":0},"hits":{"total":
108,"max_score":0.0,"hits":[{.**.}, {..}]}}

.. after some number of requests

{"_scroll_id":"def","took":1,"timed_out":false,"_shards":{"
total":5,"successful":5,"failed":0},"hits":{"total":
108,"max_score":0.0,"hits":[{.**.}, {..}]}}

On Wednesday, June 5, 2013 8:48:43 AM UTC-7, Oli wrote:

Hi guys,

I'm attempting to implement pagination for our application. The catch
is that our documents require a little post-query filtering, so sometimes
if a user requests 500 documents, we scroll, get 500 from ES, filter and
end up with a lower number. In this case, we perform the next scroll, get a
number of results and build until we have 500 valid docs.

I had some related questions about scrolling / the scroll id returned by
search scroll requests.

Question1: Is it possible to use the same scroll id multiple times to
get the same set of results in the over-all result set?

Question2: (related to Question1) I'm confused by the scroll_id
returned whilst doing a scan search then scrolling. What I see is that
that when I start scrolling, for a period of time I get the same _scroll_id
back. After some number of requests it changes. I would have expected to
either (1) get the same _scroll_id over and over or (2) get a different
_scroll_id each time. Are either of these correct? At the bottom of this
mail I've given a short example set of req/resp.

Any pointers on this appreciated. I'd also be interested in hearing from
anyone who has successfully implemented pagination and the approach you
took.

Cheers,
oli

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.