Hi,
I've got the following code to build up a scrolling search with
elasticsearch:
public ActionFuture execute(final Client client) {
final SearchRequest request = Requests.searchRequest();
request.searchType(SearchType.SCAN);
request.scroll(new TimeValue(timeout));
final SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(query);
sourceBuilder.from(0);
sourceBuilder.size(scrollSize);
sourceBuilder.explain(true);
sourceBuilder.sort(sortField, SortOrder.ASC);
request.source(sourceBuilder);
return client.search(request);
}
There are two problems I encounter:
First, instead of returning 'scrollSize' elements I always get
'scrollSize * 3' elements with each scroll request. Did I miss
something here?
Second, sorting is completely ignored with this query. What am I doing
wrong?
Thanks for any hints!
Best regards, --- Jan.
Hi Jan
First, instead of returning 'scrollSize' elements I always get
'scrollSize * 3' elements with each scroll request. Did I miss
something here?
Second, sorting is completely ignored with this query. What am I doing
wrong?
Nothing - it is working exactly as expected. The scan search_type is
intended for pulling large numbers of docs out of ES efficiently. It is
not intended for returning search results to users.
Each scroll request receives a maximum of scrollSize * $no_of_shards
results (until each shard runs out of more results).
And sorting is ignored for scan requests.
clint
Hi Clint,
Each scroll request receives a maximum of scrollSize * $no_of_shards
results (until each shard runs out of more results).
is that the expected behavior for SearchSourceBuilder as well when
setting from/size? Cause setting size to 10 also results in 30 hits.
Best regards, --- Jan.
Hi Jan
Each scroll request receives a maximum of scrollSize * $no_of_shards
results (until each shard runs out of more results).
is that the expected behavior for SearchSourceBuilder as well when
setting from/size? Cause setting size to 10 also results in 30 hits.
I can only speak for the REST interface, don't know the Java API.
But for search_type = scan, you will always get back a maximum of
size x shards results on each scroll request. (size is, by default, 10)
So if you had 5 shards, you would get back a max of 50 results at a
time, until each shard starts running out of results.
You know that you have pulled all results when you get zero results back
clint
Hi Clinton,
But for search_type = scan, you will always get back a maximum of
size x shards results on each scroll request. (size is, by default, 10)
I meant SearchType.DFS_QUERY_AND_FETCH is also returning size x shards -
is that expected?
Best regards, --- Jan.
Hi Jan
But for search_type = scan, you will always get back a maximum of
size x shards results on each scroll request. (size is, by default, 10)
I meant SearchType.DFS_QUERY_AND_FETCH is also returning size x shards -
is that expected?
Ah right. Then yes, that is expected:
Power insights and outcomes with the Elasticsearch Platform and AI. See into your data and find answers that matter with enterprise solutions designed to help you build, observe, and protect. Try Elasticsearch free today.
clint