Scrolling / sorting


(Jan Kriesten) #1

Hi,

I've got the following code to build up a scrolling search with
elasticsearch:

public ActionFuture execute(final Client client) {
final SearchRequest request = Requests.searchRequest();
request.searchType(SearchType.SCAN);
request.scroll(new TimeValue(timeout));
final SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(query);
sourceBuilder.from(0);
sourceBuilder.size(scrollSize);
sourceBuilder.explain(true);
sourceBuilder.sort(sortField, SortOrder.ASC);
request.source(sourceBuilder);

return client.search(request);
}

There are two problems I encounter:

First, instead of returning 'scrollSize' elements I always get
'scrollSize * 3' elements with each scroll request. Did I miss
something here?

Second, sorting is completely ignored with this query. What am I doing
wrong?

Thanks for any hints!

Best regards, --- Jan.


(Clinton Gormley) #2

Hi Jan

First, instead of returning 'scrollSize' elements I always get
'scrollSize * 3' elements with each scroll request. Did I miss
something here?

Second, sorting is completely ignored with this query. What am I doing
wrong?

Nothing - it is working exactly as expected. The scan search_type is
intended for pulling large numbers of docs out of ES efficiently. It is
not intended for returning search results to users.

Each scroll request receives a maximum of scrollSize * $no_of_shards
results (until each shard runs out of more results).

And sorting is ignored for scan requests.

clint


(Jan Kriesten) #3

Hi Clint,

Each scroll request receives a maximum of scrollSize * $no_of_shards
results (until each shard runs out of more results).

is that the expected behavior for SearchSourceBuilder as well when
setting from/size? Cause setting size to 10 also results in 30 hits.

Best regards, --- Jan.


(Clinton Gormley) #4

Hi Jan

Each scroll request receives a maximum of scrollSize * $no_of_shards
results (until each shard runs out of more results).

is that the expected behavior for SearchSourceBuilder as well when
setting from/size? Cause setting size to 10 also results in 30 hits.

I can only speak for the REST interface, don't know the Java API.

But for search_type = scan, you will always get back a maximum of
size x shards results on each scroll request. (size is, by default, 10)

So if you had 5 shards, you would get back a max of 50 results at a
time, until each shard starts running out of results.

You know that you have pulled all results when you get zero results back

clint


(Jan Kriesten) #5

Hi Clinton,

But for search_type = scan, you will always get back a maximum of
size x shards results on each scroll request. (size is, by default, 10)

I meant SearchType.DFS_QUERY_AND_FETCH is also returning size x shards -
is that expected?

Best regards, --- Jan.


(Clinton Gormley) #6

Hi Jan

But for search_type = scan, you will always get back a maximum of
size x shards results on each scroll request. (size is, by default, 10)

I meant SearchType.DFS_QUERY_AND_FETCH is also returning size x shards -
is that expected?

Ah right. Then yes, that is expected:

http://www.elasticsearch.org/guide/reference/api/search/search-type.html

clint


(system) #7