Just Pushed: Search Scan Type for effecient large hit set scanning

kimchy · February 21, 2011, 10:14pm

Heya,

Just pushed support for "scan" search type in order to efficiently scan/iterate over a large hit set. Issue here: https://github.com/elasticsearch/elasticsearch/issues/707.

The idea of using it is "start" the scanning, and getting back the number of docs we are going to scan over, and a scroll id. And then, start the scrolling processing, passing the previous response scroll id to the next request. Iteration is complete when no hits are back.

-shay.banon

Paul_Smith · February 21, 2011, 10:35pm

Very excited to take this out on the open road and test this in a large
index, thanks for your efforts Shay!

On 22 February 2011 09:14, Shay Banon shay.banon@elasticsearch.com wrote:

Heya,

Just pushed support for "scan" search type in order to efficiently
scan/iterate over a large hit set. Issue here:
Search: Add search type `scan` allowing to efficiently scan large result set · Issue #707 · elastic/elasticsearch · GitHub.

The idea of using it is "start" the scanning, and getting back the
number of docs we are going to scan over, and a scroll id. And then, start
the scrolling processing, passing the previous response scroll id to the
next request. Iteration is complete when no hits are back.

-shay.banon

kimchy · February 21, 2011, 11:11pm

Great!, as always, feedback on the API / usage is greatly appreciated. (and this one was tricky ).
On Tuesday, February 22, 2011 at 12:35 AM, Paul Smith wrote:

Very excited to take this out on the open road and test this in a large index, thanks for your efforts Shay!

On 22 February 2011 09:14, Shay Banon shay.banon@elasticsearch.com wrote:

Heya,

Just pushed support for "scan" search type in order to efficiently scan/iterate over a large hit set. Issue here: Search: Add search type `scan` allowing to efficiently scan large result set · Issue #707 · elastic/elasticsearch · GitHub.

The idea of using it is "start" the scanning, and getting back the number of docs we are going to scan over, and a scroll id. And then, start the scrolling processing, passing the previous response scroll id to the next request. Iteration is complete when no hits are back.

-shay.banon

Barsk · February 22, 2011, 7:49am

Great! I can see this coming in handy.
Any hints of the threshold where using this technique is getting better
results than the traditional? I.e is it worth using when the number of
hits > 100 or 100.000. Or is it best to use always if you have your own
"paging" handling of the results in your app. I.e showing 20 hits at the
time and scrolling that?

Just looking for if there are any tradeoffs one should consider.

/Kristian

Shay Banon skrev 2011-02-21 23:14:

Heya,

Just pushed support for "scan" search type in order to efficiently
scan/iterate over a large hit set. Issue here:
Search: Add search type `scan` allowing to efficiently scan large result set · Issue #707 · elastic/elasticsearch · GitHub.

The idea of using it is "start" the scanning, and getting back the
number of docs we are going to scan over, and a scroll id. And then,
start the scrolling processing, passing the previous response scroll
id to the next request. Iteration is complete when no hits are back.

-shay.banon

Karussell1 · February 22, 2011, 12:01pm

Kristian,

if you have unlimited memory you can always use the traditional
approach.
I had only some GB of RAM and the limit of the traditional approach
was for me >300.000 documents

Important: "Note, scan search type does not support sorting (either on
score or a field) or faceting."

Regards,
Peter.

On 22 Feb., 08:49, Kristian Jörg k...@devo.se wrote:

Great! I can see this coming in handy.
Any hints of the threshold where using this technique is getting better
results than the traditional? I.e is it worth using when the number of
hits > 100 or 100.000. Or is it best to use always if you have your own
"paging" handling of the results in your app. I.e showing 20 hits at the
time and scrolling that?

Just looking for if there are any tradeoffs one should consider.

/Kristian

Shay Banon skrev 2011-02-21 23:14:

Heya,

Just pushed support for "scan" search type in order to efficiently
scan/iterate over a large hit set. Issue here:
Search: Add search type `scan` allowing to efficiently scan large result set · Issue #707 · elastic/elasticsearch · GitHub.

The idea of using it is "start" the scanning, and getting back the
number of docs we are going to scan over, and a scroll id. And then,
start the scrolling processing, passing the previous response scroll
id to the next request. Iteration is complete when no hits are back.

-shay.banon

Karussell1 · February 22, 2011, 2:14pm

Hi Shay,

I'm getting an error ** on:

rsp = client.prepareSearchScroll(scrollId).

setScroll(TimeValue.timeValueMinutes(30)).execute().actionGet();

This happens only for the last search (?) although I break the while
loop when rsp.hits().hits().length ==0

Regards,
Peter.

org.elasticsearch.action.search.ReduceSearchPhaseException: Failed to
execute phase [fetch], [reduce] ; shardFailures
{SearchContextMissingException[No search context found for id [1151]]}
{SearchContextMissingException[No search context found for id [1155]]}
{SearchContextMissingException[No search context found for id [1154]]}
{SearchContextMissingException[No search context found for id [1152]]}
{SearchContextMissingException[No search context found for id [1153]]}
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.finishHim(TransportSearchScrollScanAction.java:209)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.access$1300(TransportSearchScrollScanAction.java:80)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction$3.onFailure(TransportSearchScrollScanAction.java:199)
at
org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteScan(SearchServiceTransportAction.java:
378)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.executePhase(TransportSearchScrollScanAction.java:184)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.access$700(TransportSearchScrollScanAction.java:80)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction$2.run(TransportSearchScrollScanAction.java:157)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.IndexOutOfBoundsException: index (0) must be less
than size (0)
at
org.elasticsearch.common.base.Preconditions.checkElementIndex(Preconditions.java:
301)
at
org.elasticsearch.common.base.Preconditions.checkElementIndex(Preconditions.java:
280)
at org.elasticsearch.common.collect.Iterables.get(Iterables.java:639)
at
org.elasticsearch.search.controller.SearchPhaseController.merge(SearchPhaseController.java:
259)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.innerFinishHim(TransportSearchScrollScanAction.java:226)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.finishHim(TransportSearchScrollScanAction.java:207)
... 9 more

kimchy · February 22, 2011, 5:52pm

Hi Kristian,

This is not meant for executing typical search requests, since it does no sorting (and for example, facets are not really meaningful here). It is more meant for things like reindexing part / all of an index.

-shay.banon
On Tuesday, February 22, 2011 at 9:49 AM, Kristian JÃ¶rg wrote:

Great! I can see this coming in handy.
Any hints of the threshold where using this technique is getting better
results than the traditional? I.e is it worth using when the number of
hits > 100 or 100.000. Or is it best to use always if you have your own
"paging" handling of the results in your app. I.e showing 20 hits at the
time and scrolling that?

Just looking for if there are any tradeoffs one should consider.

/Kristian

Shay Banon skrev 2011-02-21 23:14:

Heya,

Just pushed support for "scan" search type in order to efficiently
scan/iterate over a large hit set. Issue here:
Search: Add search type `scan` allowing to efficiently scan large result set · Issue #707 · elastic/elasticsearch · GitHub.

The idea of using it is "start" the scanning, and getting back the
number of docs we are going to scan over, and a scroll id. And then,
start the scrolling processing, passing the previous response scroll
id to the next request. Iteration is complete when no hits are back.

-shay.banon

kimchy · February 22, 2011, 7:05pm

Heya,

Is there a chance that you can recreate this in a testcase? Check SearchScanTests for simple tests for scanning.

-shay.banon
On Tuesday, February 22, 2011 at 4:14 PM, Karussell wrote:

Hi Shay,

I'm getting an error ** on:

rsp = client.prepareSearchScroll(scrollId).

setScroll(TimeValue.timeValueMinutes(30)).execute().actionGet();

This happens only for the last search (?) although I break the while
loop when rsp.hits().hits().length ==0

Regards,
Peter.

org.elasticsearch.action.search.ReduceSearchPhaseException: Failed to
execute phase [fetch], [reduce] ; shardFailures
{SearchContextMissingException[No search context found for id [1151]]}
{SearchContextMissingException[No search context found for id [1155]]}
{SearchContextMissingException[No search context found for id [1154]]}
{SearchContextMissingException[No search context found for id [1152]]}
{SearchContextMissingException[No search context found for id [1153]]}
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.finishHim(TransportSearchScrollScanAction.java:209)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.access$1300(TransportSearchScrollScanAction.java:80)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction$3.onFailure(TransportSearchScrollScanAction.java:199)
at
org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteScan(SearchServiceTransportAction.java:
378)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.executePhase(TransportSearchScrollScanAction.java:184)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.access$700(TransportSearchScrollScanAction.java:80)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction$2.run(TransportSearchScrollScanAction.java:157)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.IndexOutOfBoundsException: index (0) must be less
than size (0)
at
org.elasticsearch.common.base.Preconditions.checkElementIndex(Preconditions.java:
301)
at
org.elasticsearch.common.base.Preconditions.checkElementIndex(Preconditions.java:
280)
at org.elasticsearch.common.collect.Iterables.get(Iterables.java:639)
at
org.elasticsearch.search.controller.SearchPhaseController.merge(SearchPhaseController.java:
259)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.innerFinishHim(TransportSearchScrollScanAction.java:226)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.finishHim(TransportSearchScrollScanAction.java:207)
... 9 more

Paul_Smith · February 22, 2011, 10:05pm

Heya, do you have a WetFinger(tm) wild-ass-guessitimate/target for release
of 0.16? (obviously 0.15 was release just last week, so I'm not expecting
it this week.. ) Next month-ish? April? purely for planning purposes
here for me.

On 22 February 2011 09:14, Shay Banon shay.banon@elasticsearch.com wrote:

Heya,

Just pushed support for "scan" search type in order to efficiently
scan/iterate over a large hit set. Issue here:
Search: Add search type `scan` allowing to efficiently scan large result set · Issue #707 · elastic/elasticsearch · GitHub.

The idea of using it is "start" the scanning, and getting back the
number of docs we are going to scan over, and a scroll id. And then, start
the scrolling processing, passing the previous response scroll id to the
next request. Iteration is complete when no hits are back.

-shay.banon

kimchy · February 22, 2011, 11:11pm

I aim at getting on ~1 month release cycle. Would be great if you can test it before and point on problems of course
On Wednesday, February 23, 2011 at 12:05 AM, Paul Smith wrote:

Heya, do you have a WetFinger(tm) wild-ass-guessitimate/target for release of 0.16? (obviously 0.15 was release just last week, so I'm not expecting it this week.. ) Next month-ish? April? purely for planning purposes here for me.

On 22 February 2011 09:14, Shay Banon shay.banon@elasticsearch.com wrote:

Heya,

Just pushed support for "scan" search type in order to efficiently scan/iterate over a large hit set. Issue here: Search: Add search type `scan` allowing to efficiently scan large result set · Issue #707 · elastic/elasticsearch · GitHub.

The idea of using it is "start" the scanning, and getting back the number of docs we are going to scan over, and a scroll id. And then, start the scrolling processing, passing the previous response scroll id to the next request. Iteration is complete when no hits are back.

-shay.banon

Karussell1 · February 23, 2011, 4:07pm

I couldn't reproduce the exception but there is a problem/bug:

gist.github.com

https://gist.github.com/karussell/840604

scroll.java


    @Test public void testSimpleScroll3() throws Exception {
        try {
            client.admin().indices().prepareDelete("test1").execute().actionGet();
            client.admin().indices().prepareDelete("test2").execute().actionGet();
            client.admin().indices().prepareDelete("unrelatedindex").execute().actionGet();
        } catch (Exception e) {
            // ignore
        }
        client.admin().indices().prepareCreate("test1").setSettings(ImmutableSettings.settingsBuilder().put("index.number_of_shards", 2))

This file has been truncated. show original

Although the bulkUpdate in-between seems to be unrelated ... without
it all is fine!
maybe when this is solved then the exception goes away !?

Regards,
Peter.

On 22 Feb., 20:05, Shay Banon shay.ba...@elasticsearch.com wrote:

Heya,

Is there a chance that you can recreate this in a testcase? Check SearchScanTests for simple tests for scanning.

-shay.banon

On Tuesday, February 22, 2011 at 4:14 PM, Karussell wrote:

Hi Shay,

I'm getting an error ** on:

rsp = client.prepareSearchScroll(scrollId).

setScroll(TimeValue.timeValueMinutes(30)).execute().actionGet();

This happens only for the last search (?) although I break the while
loop when rsp.hits().hits().length ==0

Regards,
Peter.

org.elasticsearch.action.search.ReduceSearchPhaseException: Failed to
execute phase [fetch], [reduce] ; shardFailures
{SearchContextMissingException[No search context found for id [1151]]}
{SearchContextMissingException[No search context found for id [1155]]}
{SearchContextMissingException[No search context found for id [1154]]}
{SearchContextMissingException[No search context found for id [1152]]}
{SearchContextMissingException[No search context found for id [1153]]}
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.finishHim(TransportSearchScrollScanAction.java:209)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.access$1300(TransportSearchScrollScanAction.java:80)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction$3.onFailure(TransportSearchScrollScanAction.java:199)
at
org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteScan(SearchServiceTransportAction.java:
378)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.executePhase(TransportSearchScrollScanAction.java:184)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.access$700(TransportSearchScrollScanAction.java:80)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction$2.run(TransportSearchScrollScanAction.java:157)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.IndexOutOfBoundsException: index (0) must be less
than size (0)
at
org.elasticsearch.common.base.Preconditions.checkElementIndex(Preconditions.java:
301)
at
org.elasticsearch.common.base.Preconditions.checkElementIndex(Preconditions.java:
280)
at org.elasticsearch.common.collect.Iterables.get(Iterables.java:639)
at
org.elasticsearch.search.controller.SearchPhaseController.merge(SearchPhaseController.java:
259)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.innerFinishHim(TransportSearchScrollScanAction.java:226)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.finishHim(TransportSearchScrollScanAction.java:207)
... 9 more

kimchy · February 24, 2011, 4:12am

Your tests is wrong, since you add to the expectedIds2 the ids from the unrelatedindex as well as test2, but then only search for test2 (in the second test)...
On Wednesday, February 23, 2011 at 6:07 PM, Karussell wrote:

I couldn't reproduce the exception but there is a problem/bug:

scroll.java · GitHub

Although the bulkUpdate in-between seems to be unrelated ... without
it all is fine!
maybe when this is solved then the exception goes away !?

Regards,
Peter.

On 22 Feb., 20:05, Shay Banon shay.ba...@elasticsearch.com wrote:

Heya,

Is there a chance that you can recreate this in a testcase? Check SearchScanTests for simple tests for scanning.

-shay.banon

On Tuesday, February 22, 2011 at 4:14 PM, Karussell wrote:

Hi Shay,

I'm getting an error ** on:

rsp = client.prepareSearchScroll(scrollId).

setScroll(TimeValue.timeValueMinutes(30)).execute().actionGet();

This happens only for the last search (?) although I break the while
loop when rsp.hits().hits().length ==0

Regards,
Peter.

org.elasticsearch.action.search.ReduceSearchPhaseException: Failed to
execute phase [fetch], [reduce] ; shardFailures
{SearchContextMissingException[No search context found for id [1151]]}
{SearchContextMissingException[No search context found for id [1155]]}
{SearchContextMissingException[No search context found for id [1154]]}
{SearchContextMissingException[No search context found for id [1152]]}
{SearchContextMissingException[No search context found for id [1153]]}
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.finishHim(TransportSearchScrollScanAction.java:209)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.access$1300(TransportSearchScrollScanAction.java:80)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction$3.onFailure(TransportSearchScrollScanAction.java:199)
at
org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteScan(SearchServiceTransportAction.java:
378)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.executePhase(TransportSearchScrollScanAction.java:184)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.access$700(TransportSearchScrollScanAction.java:80)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction$2.run(TransportSearchScrollScanAction.java:157)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.IndexOutOfBoundsException: index (0) must be less
than size (0)
at
org.elasticsearch.common.base.Preconditions.checkElementIndex(Preconditions.java:
301)
at
org.elasticsearch.common.base.Preconditions.checkElementIndex(Preconditions.java:
280)
at org.elasticsearch.common.collect.Iterables.get(Iterables.java:639)
at
org.elasticsearch.search.controller.SearchPhaseController.merge(SearchPhaseController.java:
259)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.innerFinishHim(TransportSearchScrollScanAction.java:226)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.finishHim(TransportSearchScrollScanAction.java:207)
... 9 more

Karussell1 · February 24, 2011, 8:56am

ups ...

On 24 Feb., 05:12, Shay Banon shay.ba...@elasticsearch.com wrote:

Your tests is wrong, since you add to the expectedIds2 the ids from the unrelatedindex as well as test2, but then only search for test2 (in the second test)...

On Wednesday, February 23, 2011 at 6:07 PM, Karussell wrote:

I couldn't reproduce the exception but there is a problem/bug:

scroll.java · GitHub

Although the bulkUpdate in-between seems to be unrelated ... without
it all is fine!
maybe when this is solved then the exception goes away !?

Regards,
Peter.

On 22 Feb., 20:05, Shay Banon shay.ba...@elasticsearch.com wrote:

Heya,

Is there a chance that you can recreate this in a testcase? Check SearchScanTests for simple tests for scanning.

-shay.banon

On Tuesday, February 22, 2011 at 4:14 PM, Karussell wrote:

Hi Shay,

I'm getting an error ** on:

rsp = client.prepareSearchScroll(scrollId).

setScroll(TimeValue.timeValueMinutes(30)).execute().actionGet();

This happens only for the last search (?) although I break the while
loop when rsp.hits().hits().length ==0

Regards,
Peter.

org.elasticsearch.action.search.ReduceSearchPhaseException: Failed to
execute phase [fetch], [reduce] ; shardFailures
{SearchContextMissingException[No search context found for id [1151]]}
{SearchContextMissingException[No search context found for id [1155]]}
{SearchContextMissingException[No search context found for id [1154]]}
{SearchContextMissingException[No search context found for id [1152]]}
{SearchContextMissingException[No search context found for id [1153]]}
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.finishHim(TransportSearchScrollScanAction.java:209)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.access$1300(TransportSearchScrollScanAction.java:80)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction$3.onFailure(TransportSearchScrollScanAction.java:199)
at
org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteScan(SearchServiceTransportAction.java:
378)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.executePhase(TransportSearchScrollScanAction.java:184)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.access$700(TransportSearchScrollScanAction.java:80)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction$2.run(TransportSearchScrollScanAction.java:157)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.IndexOutOfBoundsException: index (0) must be less
than size (0)
at
org.elasticsearch.common.base.Preconditions.checkElementIndex(Preconditions.java:
301)
at
org.elasticsearch.common.base.Preconditions.checkElementIndex(Preconditions.java:
280)
at org.elasticsearch.common.collect.Iterables.get(Iterables.java:639)
at
org.elasticsearch.search.controller.SearchPhaseController.merge(SearchPhaseController.java:
259)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.innerFinishHim(TransportSearchScrollScanAction.java:226)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.finishHim(TransportSearchScrollScanAction.java:207)
... 9 more

Barsk · February 25, 2011, 10:08am

Ok!

Thanks for clearifying!

Regards
Kristian

Shay Banon skrev 2011-02-22 18:52:

Hi Kristian,

This is not meant for executing typical search requests, since it
does no sorting (and for example, facets are not really meaningful
here). It is more meant for things like reindexing part / all of an index.

-shay.banon

On Tuesday, February 22, 2011 at 9:49 AM, Kristian JÃ¶rg wrote:

Great! I can see this coming in handy.
Any hints of the threshold where using this technique is getting better
results than the traditional? I.e is it worth using when the number of
hits > 100 or 100.000. Or is it best to use always if you have your own
"paging" handling of the results in your app. I.e showing 20 hits at the
time and scrolling that?

Just looking for if there are any tradeoffs one should consider.

/Kristian

Shay Banon skrev 2011-02-21 23:14:

Heya,

Just pushed support for "scan" search type in order to efficiently
scan/iterate over a large hit set. Issue here:
Search: Add search type `scan` allowing to efficiently scan large result set · Issue #707 · elastic/elasticsearch · GitHub.

The idea of using it is ""start" the scanning, and getting back the
number of docs we are going to scan over, and a scroll id. And then,
start the scrolling processing, passing the previous response scroll
id to the next request. Iteration is complete when no hits are back.

-shay.banon

--
Med vÃ¤nlig hÃ¤lsning
Kristian JÃ¶rg

Devo IT AB
Tel: 054 - 22 14 58, 0709 - 15 83 42
E-post: kristian.jorg@devo.se
Webb: http://www.devo.se

Topic		Replies	Views
SCAN Search type behavior explanation Elasticsearch	1	338	July 6, 2017
Scroll search request returns documents but Scan does not Elasticsearch	1	741	July 6, 2017
_search/scroll?search_type=scan bugs/inconsistencies Elasticsearch	4	555	July 6, 2017
Scroll Search Bug? Elasticsearch	4	2608	July 6, 2017
Is there a way to do scan with limit Elasticsearch	3	798	April 4, 2018

Just Pushed: Search Scan Type for effecient large hit set scanning

Related topics