The idea of using it is "start" the scanning, and getting back the number of docs we are going to scan over, and a scroll id. And then, start the scrolling processing, passing the previous response scroll id to the next request. Iteration is complete when no hits are back.
The idea of using it is "start" the scanning, and getting back the
number of docs we are going to scan over, and a scroll id. And then, start
the scrolling processing, passing the previous response scroll id to the
next request. Iteration is complete when no hits are back.
Great!, as always, feedback on the API / usage is greatly appreciated. (and this one was tricky ).
On Tuesday, February 22, 2011 at 12:35 AM, Paul Smith wrote:
Very excited to take this out on the open road and test this in a large index, thanks for your efforts Shay!
The idea of using it is "start" the scanning, and getting back the number of docs we are going to scan over, and a scroll id. And then, start the scrolling processing, passing the previous response scroll id to the next request. Iteration is complete when no hits are back.
Great! I can see this coming in handy.
Any hints of the threshold where using this technique is getting better
results than the traditional? I.e is it worth using when the number of
hits > 100 or 100.000. Or is it best to use always if you have your own
"paging" handling of the results in your app. I.e showing 20 hits at the
time and scrolling that?
Just looking for if there are any tradeoffs one should consider.
The idea of using it is "start" the scanning, and getting back the
number of docs we are going to scan over, and a scroll id. And then,
start the scrolling processing, passing the previous response scroll
id to the next request. Iteration is complete when no hits are back.
if you have unlimited memory you can always use the traditional
approach.
I had only some GB of RAM and the limit of the traditional approach
was for me >300.000 documents
Important: "Note, scan search type does not support sorting (either on
score or a field) or faceting."
Regards,
Peter.
On 22 Feb., 08:49, Kristian Jörg k...@devo.se wrote:
Great! I can see this coming in handy.
Any hints of the threshold where using this technique is getting better
results than the traditional? I.e is it worth using when the number of
hits > 100 or 100.000. Or is it best to use always if you have your own
"paging" handling of the results in your app. I.e showing 20 hits at the
time and scrolling that?
Just looking for if there are any tradeoffs one should consider.
The idea of using it is "start" the scanning, and getting back the
number of docs we are going to scan over, and a scroll id. And then,
start the scrolling processing, passing the previous response scroll
id to the next request. Iteration is complete when no hits are back.
This happens only for the last search (?) although I break the while
loop when rsp.hits().hits().length ==0
Regards,
Peter.
org.elasticsearch.action.search.ReduceSearchPhaseException: Failed to
execute phase [fetch], [reduce] ; shardFailures
{SearchContextMissingException[No search context found for id [1151]]}
{SearchContextMissingException[No search context found for id [1155]]}
{SearchContextMissingException[No search context found for id [1154]]}
{SearchContextMissingException[No search context found for id [1152]]}
{SearchContextMissingException[No search context found for id [1153]]}
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.finishHim(TransportSearchScrollScanAction.java:209)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.access$1300(TransportSearchScrollScanAction.java:80)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction$3.onFailure(TransportSearchScrollScanAction.java:199)
at
org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteScan(SearchServiceTransportAction.java:
378)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.executePhase(TransportSearchScrollScanAction.java:184)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.access$700(TransportSearchScrollScanAction.java:80)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction$2.run(TransportSearchScrollScanAction.java:157)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.IndexOutOfBoundsException: index (0) must be less
than size (0)
at
org.elasticsearch.common.base.Preconditions.checkElementIndex(Preconditions.java:
301)
at
org.elasticsearch.common.base.Preconditions.checkElementIndex(Preconditions.java:
280)
at org.elasticsearch.common.collect.Iterables.get(Iterables.java:639)
at
org.elasticsearch.search.controller.SearchPhaseController.merge(SearchPhaseController.java:
259)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.innerFinishHim(TransportSearchScrollScanAction.java:226)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.finishHim(TransportSearchScrollScanAction.java:207)
... 9 more
This is not meant for executing typical search requests, since it does no sorting (and for example, facets are not really meaningful here). It is more meant for things like reindexing part / all of an index.
-shay.banon
On Tuesday, February 22, 2011 at 9:49 AM, Kristian Jörg wrote:
Great! I can see this coming in handy.
Any hints of the threshold where using this technique is getting better
results than the traditional? I.e is it worth using when the number of
hits > 100 or 100.000. Or is it best to use always if you have your own
"paging" handling of the results in your app. I.e showing 20 hits at the
time and scrolling that?
Just looking for if there are any tradeoffs one should consider.
The idea of using it is "start" the scanning, and getting back the
number of docs we are going to scan over, and a scroll id. And then,
start the scrolling processing, passing the previous response scroll
id to the next request. Iteration is complete when no hits are back.
This happens only for the last search (?) although I break the while
loop when rsp.hits().hits().length ==0
Regards,
Peter.
org.elasticsearch.action.search.ReduceSearchPhaseException: Failed to
execute phase [fetch], [reduce] ; shardFailures
{SearchContextMissingException[No search context found for id [1151]]}
{SearchContextMissingException[No search context found for id [1155]]}
{SearchContextMissingException[No search context found for id [1154]]}
{SearchContextMissingException[No search context found for id [1152]]}
{SearchContextMissingException[No search context found for id [1153]]}
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.finishHim(TransportSearchScrollScanAction.java:209)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.access$1300(TransportSearchScrollScanAction.java:80)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction$3.onFailure(TransportSearchScrollScanAction.java:199)
at
org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteScan(SearchServiceTransportAction.java:
378)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.executePhase(TransportSearchScrollScanAction.java:184)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.access$700(TransportSearchScrollScanAction.java:80)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction$2.run(TransportSearchScrollScanAction.java:157)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.IndexOutOfBoundsException: index (0) must be less
than size (0)
at
org.elasticsearch.common.base.Preconditions.checkElementIndex(Preconditions.java:
301)
at
org.elasticsearch.common.base.Preconditions.checkElementIndex(Preconditions.java:
280)
at org.elasticsearch.common.collect.Iterables.get(Iterables.java:639)
at
org.elasticsearch.search.controller.SearchPhaseController.merge(SearchPhaseController.java:
259)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.innerFinishHim(TransportSearchScrollScanAction.java:226)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.finishHim(TransportSearchScrollScanAction.java:207)
... 9 more
Heya, do you have a WetFinger(tm) wild-ass-guessitimate/target for release
of 0.16? (obviously 0.15 was release just last week, so I'm not expecting
it this week.. ) Next month-ish? April? purely for planning purposes
here for me.
The idea of using it is "start" the scanning, and getting back the
number of docs we are going to scan over, and a scroll id. And then, start
the scrolling processing, passing the previous response scroll id to the
next request. Iteration is complete when no hits are back.
I aim at getting on ~1 month release cycle. Would be great if you can test it before and point on problems of course
On Wednesday, February 23, 2011 at 12:05 AM, Paul Smith wrote:
Heya, do you have a WetFinger(tm) wild-ass-guessitimate/target for release of 0.16? (obviously 0.15 was release just last week, so I'm not expecting it this week.. ) Next month-ish? April? purely for planning purposes here for me.
The idea of using it is "start" the scanning, and getting back the number of docs we are going to scan over, and a scroll id. And then, start the scrolling processing, passing the previous response scroll id to the next request. Iteration is complete when no hits are back.
This happens only for the last search (?) although I break the while
loop when rsp.hits().hits().length ==0
Regards,
Peter.
org.elasticsearch.action.search.ReduceSearchPhaseException: Failed to
execute phase [fetch], [reduce] ; shardFailures
{SearchContextMissingException[No search context found for id [1151]]}
{SearchContextMissingException[No search context found for id [1155]]}
{SearchContextMissingException[No search context found for id [1154]]}
{SearchContextMissingException[No search context found for id [1152]]}
{SearchContextMissingException[No search context found for id [1153]]}
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.finishHim(TransportSearchScrollScanAction.java:209)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.access$1300(TransportSearchScrollScanAction.java:80)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction$3.onFailure(TransportSearchScrollScanAction.java:199)
at
org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteScan(SearchServiceTransportAction.java:
378)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.executePhase(TransportSearchScrollScanAction.java:184)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.access$700(TransportSearchScrollScanAction.java:80)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction$2.run(TransportSearchScrollScanAction.java:157)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.IndexOutOfBoundsException: index (0) must be less
than size (0)
at
org.elasticsearch.common.base.Preconditions.checkElementIndex(Preconditions.java:
301)
at
org.elasticsearch.common.base.Preconditions.checkElementIndex(Preconditions.java:
280)
at org.elasticsearch.common.collect.Iterables.get(Iterables.java:639)
at
org.elasticsearch.search.controller.SearchPhaseController.merge(SearchPhaseController.java:
259)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.innerFinishHim(TransportSearchScrollScanAction.java:226)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.finishHim(TransportSearchScrollScanAction.java:207)
... 9 more
Your tests is wrong, since you add to the expectedIds2 the ids from the unrelatedindex as well as test2, but then only search for test2 (in the second test)...
On Wednesday, February 23, 2011 at 6:07 PM, Karussell wrote:
I couldn't reproduce the exception but there is a problem/bug:
This happens only for the last search (?) although I break the while
loop when rsp.hits().hits().length ==0
Regards,
Peter.
org.elasticsearch.action.search.ReduceSearchPhaseException: Failed to
execute phase [fetch], [reduce] ; shardFailures
{SearchContextMissingException[No search context found for id [1151]]}
{SearchContextMissingException[No search context found for id [1155]]}
{SearchContextMissingException[No search context found for id [1154]]}
{SearchContextMissingException[No search context found for id [1152]]}
{SearchContextMissingException[No search context found for id [1153]]}
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.finishHim(TransportSearchScrollScanAction.java:209)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.access$1300(TransportSearchScrollScanAction.java:80)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction$3.onFailure(TransportSearchScrollScanAction.java:199)
at
org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteScan(SearchServiceTransportAction.java:
378)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.executePhase(TransportSearchScrollScanAction.java:184)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.access$700(TransportSearchScrollScanAction.java:80)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction$2.run(TransportSearchScrollScanAction.java:157)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.IndexOutOfBoundsException: index (0) must be less
than size (0)
at
org.elasticsearch.common.base.Preconditions.checkElementIndex(Preconditions.java:
301)
at
org.elasticsearch.common.base.Preconditions.checkElementIndex(Preconditions.java:
280)
at org.elasticsearch.common.collect.Iterables.get(Iterables.java:639)
at
org.elasticsearch.search.controller.SearchPhaseController.merge(SearchPhaseController.java:
259)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.innerFinishHim(TransportSearchScrollScanAction.java:226)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.finishHim(TransportSearchScrollScanAction.java:207)
... 9 more
Your tests is wrong, since you add to the expectedIds2 the ids from the unrelatedindex as well as test2, but then only search for test2 (in the second test)...
On Wednesday, February 23, 2011 at 6:07 PM, Karussell wrote:
I couldn't reproduce the exception but there is a problem/bug:
This happens only for the last search (?) although I break the while
loop when rsp.hits().hits().length ==0
Regards,
Peter.
org.elasticsearch.action.search.ReduceSearchPhaseException: Failed to
execute phase [fetch], [reduce] ; shardFailures
{SearchContextMissingException[No search context found for id [1151]]}
{SearchContextMissingException[No search context found for id [1155]]}
{SearchContextMissingException[No search context found for id [1154]]}
{SearchContextMissingException[No search context found for id [1152]]}
{SearchContextMissingException[No search context found for id [1153]]}
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.finishHim(TransportSearchScrollScanAction.java:209)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.access$1300(TransportSearchScrollScanAction.java:80)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction$3.onFailure(TransportSearchScrollScanAction.java:199)
at
org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteScan(SearchServiceTransportAction.java:
378)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.executePhase(TransportSearchScrollScanAction.java:184)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.access$700(TransportSearchScrollScanAction.java:80)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction$2.run(TransportSearchScrollScanAction.java:157)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.IndexOutOfBoundsException: index (0) must be less
than size (0)
at
org.elasticsearch.common.base.Preconditions.checkElementIndex(Preconditions.java:
301)
at
org.elasticsearch.common.base.Preconditions.checkElementIndex(Preconditions.java:
280)
at org.elasticsearch.common.collect.Iterables.get(Iterables.java:639)
at
org.elasticsearch.search.controller.SearchPhaseController.merge(SearchPhaseController.java:
259)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.innerFinishHim(TransportSearchScrollScanAction.java:226)
at
org.elasticsearch.action.search.type.TransportSearchScrollScanAction
$AsyncAction.finishHim(TransportSearchScrollScanAction.java:207)
... 9 more
This is not meant for executing typical search requests, since it
does no sorting (and for example, facets are not really meaningful
here). It is more meant for things like reindexing part / all of an index.
-shay.banon
On Tuesday, February 22, 2011 at 9:49 AM, Kristian Jörg wrote:
Great! I can see this coming in handy.
Any hints of the threshold where using this technique is getting better
results than the traditional? I.e is it worth using when the number of
hits > 100 or 100.000. Or is it best to use always if you have your own
"paging" handling of the results in your app. I.e showing 20 hits at the
time and scrolling that?
Just looking for if there are any tradeoffs one should consider.
The idea of using it is ""start" the scanning, and getting back the
number of docs we are going to scan over, and a scroll id. And then,
start the scrolling processing, passing the previous response scroll
id to the next request. Iteration is complete when no hits are back.
-shay.banon
--
Med vänlig hälsning
Kristian Jörg
Devo IT AB
Tel: 054 - 22 14 58, 0709 - 15 83 42
E-post: kristian.jorg@devo.se
Webb: http://www.devo.se
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.