Hi,
I had faced this issue quite long ago while implementing custom client side
code for ES data backup and re-indexing purpose.
Yes, the scroll ID remains same for few hits, after which it changes.
The solution is to use the scroll ID returned with every hit response in
the subsequent request, i.e. following an ID chaining mechanism will work.
Using the first scroll ID repeatedly fetches only a few results, not all.
I guess the scroll ID gets renewed after the timestamp expires (calculated
from the point of first hit). But this statement is based on random
observation, I am not sure of this, ES experts can elaborate the underlying
cause better. I would be glad to know the actual cause too.
- Sujoy.
On Wednesday, June 5, 2013 9:20:34 PM UTC+5:30, Oli wrote:
Ah, I never gave my example. In case it's of use:
Request 1
curl -XPOST
'localhost:9200/foo/bar/_search?search_type=scan&scroll=10m&size=10'
-d
'{"query":{"constant_score":{"boost":1,"filter":{"term":{"x":false}}}}}'{"_scroll_id":"abc
","took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":108,"max_score":0.0,"hits":}}Request 2
curl -XPOST 'localhost:9200/_search/scroll?scroll=10m'
-d 'abc'{"_scroll_id":"abc","took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":108,"max_score":0.0,"hits":[{..},
{..}]}}.. after some number of requests
{"_scroll_id":"def","took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":108,"max_score":0.0,"hits":[{..},
{..}]}}On Wednesday, June 5, 2013 8:48:43 AM UTC-7, Oli wrote:
Hi guys,
I'm attempting to implement pagination for our application. The catch is
that our documents require a little post-query filtering, so sometimes if a
user requests 500 documents, we scroll, get 500 from ES, filter and end up
with a lower number. In this case, we perform the next scroll, get a number
of results and build until we have 500 valid docs.I had some related questions about scrolling / the scroll id returned by
search scroll requests.Question1: Is it possible to use the same scroll id multiple times to
get the same set of results in the over-all result set?Question2: (related to Question1) I'm confused by the scroll_id
returned whilst doing a scan search then scrolling. What I see is that
that when I start scrolling, for a period of time I get the same _scroll_id
back. After some number of requests it changes. I would have expected to
either (1) get the same _scroll_id over and over or (2) get a different
_scroll_id each time. Are either of these correct? At the bottom of this
mail I've given a short example set of req/resp.Any pointers on this appreciated. I'd also be interested in hearing from
anyone who has successfully implemented pagination and the approach you
took.Cheers,
oli
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.