Scroll and Scan


(Robbie) #1

Hi,
I want to parse through a bunch of indices, each having 5 shards,
searching for some content. I would like to receive a fixed number of
results from each scroll call. I realize that scroll can be provided with a
batch size per shard.
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scan-scroll.html

For the case below, could you tell me how to ensure that I receive a fixed
number of results for each scroll call.

I have Index A [ 5 shards] with 500 documents that will match my search.
I have Index B [ 5 shards] with 100 documents that will match my search.
I have Index C [ 5 shards] with 50 documents that will match my search.

If I want to receive 75 results in each search, I create a scroll with
batch size of 5 [ i.e 5 results from each shard]

In the first few calls I get 75 results, but after 2 such calls, I will be
left with the following

Index A [ 5 shards] with 450 documents that will match my search.
Index B [ 5 shards] with 50 documents that will match my search.
Index C [ 5 shards] with 0 documents that will match my search.

Now my 3rd scroll call, will only return 25 results from Index A and 25
results from Index B.

Is there any elegant way to specify that each scroll call provide a fixed
number of results?

Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a60e9505-3e11-42c0-b1ad-6f19be861983%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Robbie) #2

One another issue that I am seeing is that sometimes the scroll api returns
a different number of results than what I expect.

For instance if I have Index A with 5 shards, Index B with 5 shards, and
both indices have 10K records which match my filter, if my scroll size is
10 per shard, I sometimes get 70 or 80 returned instead of 100. Is there
some issue in the way I have the scan/scroll configured?

/_search?search_type=scan&scroll=5m
{
"size" : 10,
"query": {
"constant_score": {
"filter": {
"missing": { "field": "fieldName" }
}
}
}
}

On Monday, April 14, 2014 4:00:38 PM UTC-7, Robbie wrote:

Hi,
I want to parse through a bunch of indices, each having 5 shards,
searching for some content. I would like to receive a fixed number of
results from each scroll call. I realize that scroll can be provided with a
batch size per shard.

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scan-scroll.html

For the case below, could you tell me how to ensure that I receive a fixed
number of results for each scroll call.

I have Index A [ 5 shards] with 500 documents that will match my search.
I have Index B [ 5 shards] with 100 documents that will match my search.
I have Index C [ 5 shards] with 50 documents that will match my search.

If I want to receive 75 results in each search, I create a scroll with
batch size of 5 [ i.e 5 results from each shard]

In the first few calls I get 75 results, but after 2 such calls, I will be
left with the following

Index A [ 5 shards] with 450 documents that will match my search.
Index B [ 5 shards] with 50 documents that will match my search.
Index C [ 5 shards] with 0 documents that will match my search.

Now my 3rd scroll call, will only return 25 results from Index A and 25
results from Index B.

Is there any elegant way to specify that each scroll call provide a fixed
number of results?

Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/811c22db-24fe-4554-a51c-bc9f55c68642%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Brian Yoder) #3

Robbie,

Are you repeating the scan until there are no more hits returned? I've
never bothered to check the repeatability of the individual "chunks" and
only notice that the overall total count is as expected. Also note the
following from the guide:

The scroll request also returns a new _scroll_id. Every time we make the
next scroll request, we must pass the _scroll_id returned by the previous scroll
request.

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6cc136da-453c-40e5-a9c6-dea23326c8f9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Robbie) #4

Thanks Brian.
I am using the scroll_id from the SearchResponse returned with each scroll
request for the subsequent scroll request. And yes, I am repeating the scan
until there are no more results. However, I was expecting to get n*# of
shards returned with each request, but I seem to be getting only a few
shards as successful . It is as though there is no data in the other active
primary shards and all the data in my index is in a couple of shards.

Regards

On Tuesday, April 15, 2014 12:32:47 PM UTC-7, InquiringMind wrote:

Robbie,

Are you repeating the scan until there are no more hits returned? I've
never bothered to check the repeatability of the individual "chunks" and
only notice that the overall total count is as expected. Also note the
following from the guide:

The scroll request also returns a new _scroll_id. Every time we make
the next scroll request, we must pass the _scroll_id returned by the
previous scroll request.

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/93169c34-4d38-43ea-8117-7b64c076aba8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #5